Method for predicting the athletic performance potential of a subject

ABSTRACT

A method for predicting the athletic performance potential of a subject comprising the step of assaying a biological sample from a subject for a genetic variant in linkage disequilibrium with MSTN-66493737 (T/C) SNP. The invention also provides an assay for determining the athletic performance potential of a subject.

The invention relates to a method for predicting the athleticperformance potential of a subject.

PRIOR APPLICATION

This is a Continuation of application Ser. No. 13/046,432, filed Mar.11, 2011, which is a Continuation in Part of 13/063,715, filed May 27,2011, which is a National Stage Entry of PCT/IE2009/000062, filed Sep.11, 2009, which claims the benefit of U.S. Provisional Application No.61/213,125, filed May 8, 2009 and U.S. Provisional Application No.61/136,533, filed Sep. 11, 2008, all of which are incorporated herein byreference.

INTRODUCTION

Myostatin gene (MSTN) variants have previously been shown to contributeto muscle hypertrophy in a range of mammalian species (Grobet et al1997; McPherron et al. 1997; McPherron & Lee 1997; Schuclke et al. 2004;Mosher et al. 2007). In particular, whippet racing dogs that areheterozygote for a MSTN polymorphism have, significantly greater racingability than both homozygote wild-type dogs and homozygotes for themutation that have ah increased musculature that is detrimental toperformance (Mosher et al 2007). Horses, in particular Thoroughbreds,have a very high muscle mass to body weight ratio (55%) compared toother mammalian species (30-40%) (Gunn 1987) and the Thoroughbred genomecontains evidence for selection for muscle strength p lie no types (Guet al. 2009).

The Thoroughbred horse industry is a multi-billion dollar internationalenterprise engaged in the breeding, training and racing of eliteracehorses. A Thoroughbred is a registered racehorse that can trace itsancestry to one of three foundation stallions and the approximately 30foundation mares entered in The General Studbook, 1791 (Weatherby andSons 1791). During the 300-year development of the breed racehorses havebeen intensely selected for athletic phenotypes that; enable superiorracecourse performance in particular types of races. There are two typesof Thoroughbred race: National Hunt races are run over hurdles orsteeplechase fences over distances of up to 4.5 miles (7,200 m), whileFlat races have no obstacles and are run over distances ranging fromfive furlongs (⅝ mile or 1,006 m) to 20 furlongs (4,024 m). TheInternational Federation of Horseracing Authorities recognizes five racedistance categories: Sprint (5-6.5 f, <1,300 m), Mile (6.51-9.49 f,1,301-1,900 m), Intermediate (9.5-10.5 f, 1,901-2,112 m), Long(10.51-13.5 f, 2,114-2,716 m) and Extended (>13.51 f, >2,717 m) races(International Federation of Horseracing Authorities Classifications,www.horseracingintfed.com) [Note: 1 furlong=⅛ mile=201.2 meters] andhorses that compete in these races are generally termed ‘sprinters’ (<6furlongs), ‘middle distance’ or ‘miters’ (7-8 f) or ‘stayers’ (>8 f).Similar to their human counterparts, sprint racing Thoroughbreds aregenerally more compact and muscular than horses suited to longerdistance races.

A range of approaches has been taken to investigate measurableassociations with athletic performance phenotypes in Thoroughbredracehorses including assessment of heart size (Young et al 2005), musclefibre type (Rivero et al. 2007) musculoskeletal conformation (Love ct al2006), speed at maximum heart rate (Gramkow & Evans 2006),haematological (Revington 1983) and other physiological variables(Harkins et al 1993).

WO2006003436 describes the association between performance and genevariants encoded by the mitochondrial genome. However, mitochondrial DNA(mtDNA) haplotypes are inherited strictly from the maternal parent andtherefore relate solely to female contributions to the phenotype. Asthere is a limited number of mtDNA haplotypes (w =17) in theThoroughbred population and just 10 females contribute to 74% of presentmaternal lineages (Cunningham et al 2002) it is unlikely that thesehaplotype variants have a significant effect as the favourablehaplotypes would become ‘fixed’ quickly in a population where there istargeted selection for performance; in addition, the effectivepopulation size (of mtDNA variants) is one third of nuclear-encodedvariants (Ballard and Dean 2001, Blier et al 2001, Das 2006, Meiklejohnet al 2007). Also, mtDNA haplotypes can be directly inferred frompedigree information.

It is an object of the invention to provide a method for predicting theathletic performance potential of a subject.

STATEMENTS OF INVENTION

The invention provides a method for predicting the athletic performancepotential of a subject comprising the step of:

-   -   assaying a biological sample from a subject for a genetic        variant in linkage disequilibrium with MSTN-66493737 (T/C) SNP.

The subject may be an equine. The genetic variant may be located inequine chromosome 18. The genetic variant may be located in the MSTNgene region. The genetic variant may be located in the MSTN geneflanking region. The genetic variant may be chosen from one or more of:BIEC2-417495 SNP, BIEC2-417372 SNP, MSTN Ins227bp mutation, MSTN 3′UTRSNP1, MSTN 3′UTR SNP2, MSTN 3′UTR SNP3, or MSTN 3′UTR SNP4. Thegenetic.variant may be BIEC2417495 SNP. The presence of a C allele maybe indicative of elite athletic performance. The presence of aheterozygous CT genotype may be indicative of elite athleticperformance. The presence of a homozygous CC genotype may be indicativeof elite athletic performance.

The elite athletic performance may be elite sprinting performance. Thebiological sample of the subject may be chosen from one or more of:blood, saliva, skeletal muscle, hair, semen, bone marrow, soft tissue,internal organ biopsy sample or skin.

The invention also provides an assay for determining the athleticperformance potential of a subject comprising the steps of:

-   -   obtaining a biological sample from the subject;    -   extracting or releasing DNA from the biological sample; and    -   identifying a genetic variant in linkage disequilibrium with        MSTN-66493737 (T/C) SNP in the biological sample

-   wherein the athletic performance potential of the subject is    associated with the genetic variant and/or the MSTN-66493737 (T/C)    SNP.

The DNA may be genomic DNA.

The assay may further comprise the step of:

-   -   amplifying a target sequence in the extracted or released DNA

-   prior to the step of identifying a genetic variant in linkage    disequilibrium with MSTN-66493737 (T/C) SNP

The subject may be an equine. The genetic variant may be located inequine chromosome 18. The genetic variant may be located in the MSTNgene region. The genetic variant may be located in the MSTN geneflanking region. The genetic variant may be chosen from one or more of:BIEC2-417495 SNP, BIEC2-417372 SNP, MSTN Ins227bp mutation, MSTN 3′UTRSNP1, MSTN 3′UTR SNP2, MSTN 3′UTR SNP3, or MSTN 3′UTR SNP4.

The genetic variant may be BIEC2417495 SNP. The presence of a C allelemay be indicative of elite athletic performance. The presence of aheterozygous CT genotype may be indicative of elite athleticperformance. The presence of a homozygous CC genotype may be indicativeof elite athletic performance.

The elite athletic performance may be elite sprinting performance.

The biological sample of the subject may be chosen from one or more of:blood, saliva, skeletal muscle, hair, semen, bone marrow, soft tissue,internal organ biopsy sample or skin.

The invention further provides a method for predicting the athleticperformance potential of a subject comprising the step of:

-   -   assaying a biological sample from a subject for the presence        of (i) a MSTN-66493737 (T/C) SNP and (ii) a genetic variant in        linkage disequilibrium with the MSTN-66493737 (T/C) SNP.

The subject may be an equine. The genetic variant may .be located inequine chromosome 18. The genetic variant may be located in the MSTNgene region. The genetic variant may be located in the MSTN geneflanking region. The genetic variant may be chosen from one or more of:BIEC2-417495 SNP, BIEC2-417372 SNP, MSTN Ins227bp mutation, MSTN 3′UTRSNP1, MSTN 3′UTR SNP2, MSTN 3′UTR SNP3, or MSTN 3′UTR SNP4.

The genetic variant may be BIEC2417495 SNP. The presence of a C allelein the BIEC2417495 SNP may be indicative of elite athletic performance.The presence of a heterozygous CT genotype in the BIEC2417495 SNP may beindicative of elite athletic performance. The presence of a homozygousCC genotype in the BIEC2417495 SNP may be indicative of elite athleticperformance.

The presence of C allele in the MSTN-66493737 (T/C) SNP may beindicative of elite athletic performance. The presence of a heterozygousCT genotype in the MSTN-66493737 (T/C) SNP may be indicative of eliteathletic performance. The presence of a homozygous CC genotype in theMSTN-66493737 (T/C) SNP may be indicative of elite athletic performance.

The elite athletic performance may be elite sprinting performance.

The biological sample of the subject may be chosen from one or more of:blood,-saliva, skeletal muscle, hair, semen, bone marrow, soft tissue,internal organ biopsy sample or skin.

The invention also provides an assay for determining the athleticperformance potential of a subject comprising the steps of:

-   -   obtaining a biological sample from the subject;    -   extracting or releasing DNA from the biological sample; and    -   identifying (i) a MSTN-66493737 (T/C) SNP and (ii) a genetic        variant in linkage disequilibrium with the MSTN-66493737 (T/C)        SNP in the biological sample wherein the athletic performance        potential of the subject is associated with the MSTN-66493737        (T/C) SNP and/or the genetic variant.

The DNA may be genomic DNA.

The assay may further comprise the step of:

-   -   amplifying a target sequence in the extracted or released DNA

-   prior to the step of identifying (i) a MSTN-66493737 (T/C) SNP    and (ii) a genetic variant in linkage disequilibrium with the    MSTN-66493737 (T/C) SNP in the biological sample.

The subject may be an equine. The genetic variant may be located inequine chromosome 18. The genetic variant may be located in the MSTNgene region. The genetic variant may be located in the MSTN geneflanking region. The genetic variant may be chosen from one or more of:BIEC2-417495 SNP, BIEC2-417372 SNP, MSTN Ins227bp mutation, MSTN 3′UTRSNP1, MSTN 3′UTR SNP2, MSTN 3′UTR SNP3, or MSTN 3′UTR SNP4.

The genetic variant may be BIEC2417495 SNP. The presence of a C allelein the BIEC2417495 SNP may be indicative of elite athletic performance.The presence of a heterozygous CT genotype in the BIEC2417495 SNP may beindicative of elite athletic performance. The presence of a homozygousCC genotype in the BIEC2417495 SNP may be indicative of elite athleticperformance.

The presence of C allele-in the MSTN-66493737 (T/C) SNP may beindicative of elite athletic performance. The presence of a heterozygousCT genotype in the MSTN-66493737 (T/C) SNP may be indicative of eliteathletic performance. The presence of a homozygous CC genotype in theMSTN-66493737 (T/C) SNP may be indicative of elite athletic performance;

The elite athletic performance may be elite sprinting performance.

The biological sample of the subject may be chosen from one or more of:blood, saliva, skeletal muscle, hair, semen, bone marrow, soft tissue,internal organ biopsy sample or skin.

The invention further provides a method for predicting the athleticperformance potential of a subject comprising the step of assaying abiological sample from a subject for the presence of a DNA polymorphism(SNP or insertion) in the MSTN gene and/or flanking sequences.

The DNA polymorphism may be an insertion polymorphism. The polymorphismmay be Chr 18g.66495327Ins227bp66495326. The presence of a Ins227bpallele may be indicative of elite athletic performance. The presence ofa homozygous Ins227bp/Ins227bp genotype may be indicative of eliteathletic performance. The elite athletic performance may be elitesprinting performance. The biological sample of the subject may beselected from the group comprising: blood, saliva, skeletal muscle,hair, semen, bone marrow, soft tissue, internal organ biopsy sample andskin.

The subject may be from a competitive racing species. The subject may bean equine. The subject may be chosen from one or more of a thoroughbredrace horse, a standardbred trotter, a French trotter, a Quarter horse,or a competitive jumping horse.

The invention further provides an assay for determining the athleticperformance potential of a subject comprising the steps of:

-   -   obtaining a sample;    -   extracting or releasing DNA from the sample; and    -   identifying a polymorphism (SNP or insertion) in a target        sequence from an MSTN gene associated with athletic performance        in the extracted or released DNA

-   wherein the athletic performance potential of a subject is    associated with the polymorphism.

The polymorphism may be an insertion polymorphism. The polymorphism maybe Chr18g.66495327Ins227bp66495326. The presence of a Ins227bp allelemay be indicative of elite athletic performance. The presence of ahomozygous Ins227bp/Ins227bp genotype may be indicative of eliteathletic performance. The elite athletic performance may be elitesprinting performance

The assay may comprise the step of:

-   -   amplifying a target sequence from a gene associated with        athletic performance in the extracted or released DNA

-   prior to the step of identifying a DNA polymorphism.

The DNA may be genomic DNA

The invention also provides an assay for use in determining the athleticperformance potential of a subject comprising a detector for detectingthe presence, of a polymorphism (SNP or insertion) in the MSTN geneand/or flanking sequences.

The polymorphism may be an insertion polymorphism. The polymorphism maybe Chr18g.664953271ns227bp66495326. The presence of a Ins227bp allelemay be indicative of elite athletic performance. The presence of ahomozygous Ins227bp/Ins227bp genotype may be indicative of eliteathletic performance. The elite athletic performance may be elitesprinting performance.

The invention further provides an assay for determining the athleticpotential of a subject comprising the step of:

-   -   obtaining a sample;    -   extracting or releasing DNA from the sample; and    -   identifying the genotype of the Chr18g.66495327Ins227bp66495326        polymorphism in the extracted or released DNA

-   wherein the presence of a Ins227bp allele in the    Chr18g.66495327Ins227bp66495326 polymorphism is indicative of elite    athletic performance.

The assay may comprise the step of:

-   -   amplifying a target sequence encoding the        Chr18g.66495327Ins227bp66495326 polymorphism in the extracted or        released DNA

-   prior to the step of identifying the genotype of the Chr1    8g.66495327Ins227bp66495326 polymorphism.

The presence of a homozygous Ins227bp/Ins227bp genotype may beindicative of elite athletic performance. The elite athletic performancemay be elite sprinting performance.

The DNA may be genomic DNA.

The subject may be from a competitive racing species. The subject may bean equine. The subject may be chosen from one or more of a thoroughbredrace horse, a standardbred trotter, a French trotter, a Quarter horse,or a competitive jumping horse.

The invention further provides a MSTN insertion mutation encoded by theDNA sequence of SEQ ID No. 23.

This invention provides DNA-based tests for detecting structural geneticvariation in nuclear-encoded genes.

The methods and assays described herein arc performed ex vivo and can beconsidered to be ex vivo or in vitro methods and assays.

Any suitable biological sample which contains genetic material forexample, blood, saliva, hair, skin, bone marrow, soft tissue, internalorgans, biopsy sample, semen, skeletal muscle tissue and the like, maybe used as a biological sample for the methods described herein. Bloodand hair samples are particularly suitable as a biological sample.

“Athletic performance” as used herein includes racing such ascompetitive racing and equestrian sports such as racing, showjumping,trotting, eventing, dressage, endurance events, riding, hunting and thelike. The equestrian sports may be competitive sports. Of particularimportance is sprint racing performance.

Competitive racing species include equines (horses), camels, dogs,elephants, hares, kangaroos, ostriches, pigeons, Homo sapiens and birdsof prey such as hawks or falcons. The competitive racing species may bea competition horse such as a Thoroughbred race horse, StandardbredTrotter, French Trotter, Quarter Horse or a competitive jumping horse.

By “primer” we mean a nucleic acid sequence containing between about 15to about 40 for example between about 18 to about 25 contiguousnucleotides from a nucleic acid sequence of interest. The primer may bea forward (5′ or 3′) or reverse (3′ to 5′) primer or a primer designedon a complementary nucleic acid sequence to the sequence of interest. Inthe present invention, the sequence of interest is the genomic sequenceof a gene associated with athletic performance, for example myostatin.In one embodiment, the primer may comprise between about 15 to about 40nucleotides. By “complementary sequence” we mean a sequence that bindsto the sequence of interest using conventional Watson-Crick base pairingi.e. adenine binds to thymine and cytosine binds to guanine.

In our PCT/IE2009/000062, the entire contents of which is incorporatedherein by reference, we describe the association between athleticperformance and single nucleotide polymorphisms for example a singlepolymorphism (g.66493737C>T) in the myostatin gene. Novel sequencevariants were identified by re-sequencing the equine MSTN gene in 24unrelated Thoroughbred horses using 13 overlapping primer pairs spanningall three exons and 288 bp of the 5′ upstream region. Although no exonicsequence variants were detected, six SNPs were detected in intron 1 ofMSTN [nt 66492979-66494807]. There was a highly significant(P=3.70×10⁻⁵) association with g.66493737C>T and elite short distance(≦8 f) racing performance and this association became marginallystronger (P=1.88×10⁻⁵) when the short distance cohort was furthersubdivided into animals (n=43) that had won their best race overdistances ≦7 f. The C allele was twice as frequent in the short distance(≦7 f) than in the long distance (>8 f) cohort (0.72 and 0.36respectively) corresponding to an odds ratio of 4.54 (95% C.I.2.23-9.23). The most parsimonious model was the genotypic model(P=1.18×10 ⁻⁶) indicating that genotypes are predictive of optimumracing distance. Considering best race distance (BRD) as a quantitativetrait, we analyzed the data for the elite cohort using the distance(furlongs) of the highest grade or most valuable Group race won as thephenotype (n=79). BRD was highly significantly associated (P=4.85×10⁻⁸)with the g.66493737C>T SNP. This result was independently validated(P=1.91×10⁻⁶) in a re-sampled group of unrelated elite (Group and Listedrace winners) Thoroughbreds (n=62) and in a cohort of 37 eliteracehorses (P=0.0047) produced by the same trainer. For each genotype wedetermined the mean BRD in the original sample: C/C mean=6.2±0.8 f; C/Tmean=9.1±2.4 f; and T/T mean=10.5±2.7 f.

The invention provides structural DNA polymorphisms (including insertionpolymorphisms and single nucleotide polymorphisms) that are associatedwith elite athletic performance. The invention provides a method ofpredicting the athletic performance of a subject comprising the step ofassaying a biological sample from the subject for the presence of astructural DNA polymorphism (SNP or insertion) in MSTN wherein thepolymorphism has a significant association with athletic performance,especially sprint racing. According to the invention there is provided amethod for predicting the athletic performance potential of a subjectcomprising the step, of assaying a biological sample from a subject forthe presence of a polymorphism in the MSTN gene and/or flankingsequences. The polymorphism may be an insertion polymorphism.

The polymorphism may be Chr18g.66495327Ins227bp66495326. The presence ofthe Ins227bp allele is indicative of elite athletic performance. Thepresence of a homozygous Ins227bp genotype may indicative of eliteathletic performance. The elite athletic performance may be elitesprinting performance. The elite athletic performance may be earlytwo-year old performance.

The biological sample of the subject may be selected from the groupcomprising: blood, saliva,: skeletal muscle, skin, semen, biopsy, bonemarrow, soft tissue, internal organs and hair.

The subject may be from a competitive racing species. The subject may bean equine such as a Thoroughbred race horse, Standardbred Trotter,French Trotter or Quarter Horse.

The invention further provides an assay for determining the athleticperformance potential of a subject comprising the steps of:

-   -   obtaining a sample;    -   extracting or releasing DNA from the sample; and    -   identifying a polymorphism (SNP or insertion) in a target        sequence from an MSTN gene associated with athletic performance        in the extracted of released DNA        wherein the athletic performance potential of a subject is        associated with the polymorphism.

The polymorphism may be an insertion polymorphism.

The assay may comprise the step of:

-   -   amplifying a target sequence from a gene or upstream region of a        gene associated with athletic performance in the extracted or        released DNA        prior to the step of identifying a DNA polymorphism.

The DNA may be genomic DNA

The invention further provides an assay for use in determining theathletic performance potential of a subject comprising means fordetecting the presence of a polymorphism (SNP or insertion) in the MSTNgene and/or flanking sequences.

The polymorphism may be Chr18g.66495327Ins227bp66495326. The presence ofa Ins227bp allele is indicative of elite athletic performance. Thepresence of a homozygous Ins227bp. genotype may indicative of eliteathletic performance: The elite athletic performance may be elitesprinting performance. The elite athletic performance may be earlytwo-year old performance.

The invention also provides an assay for determining the athleticpotential of a subject comprising the step of:

-   -   obtaining a sample;    -   extracting or releasing DNA from the sample;    -   identifying the genotype of the Chr18g.66495327Ins227bp66495326        polymorphism in the extracted or released DNA wherein the        presence of a Ins227bp allele in the        Chr18g.66495327Iris227bp66495326 polymorphism is indicative of        elite athletic performance.

The assay may comprise the step of:

-   -   amplifying a target sequence encoding the        Chr18g.66495327Ins227bp66495326 polymorphism in the extracted or        released DNA        prior to the step of identifying the genotype of the        Chr18g.66495327Ins227bp66495326 polymorphism.

The presence of a homozygous Ins227bp genotype indicative of eliteathletic performance. The elite athletic performance may be elitesprinting performance.

The DNA may be genomic DNA.

The sample from the subject may be selected from the group comprising:blood, saliva, skeletal muscle skin, bone marrow, biopsy, soft tissue,semen, internal organ and hair.

The subject may be from a competitive racing species. The subject may bean equine such as a Thoroughbred race horse.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention will be more clearly understood from the followingdescription of an embodiment thereof, given by way of example only, withreference to the accompanying drawings, in which:—

FIG. 1 is a schematic of the best race distance for each of the threeMSTN 66493737 (T/C) SNP genotypes;

FIG. 2 is a bar chart showing the distribution of MSTN 66493737 (T/C)SNP genotypes in Thoroughbred subpopulations;

FIG. 3 is a Manhattan plot of P-valuc for genotype-phenotype GWAS inshort (≦8 f) and middle-long. (>8 f) distance elite race winners. They-axis plots −log₁₀(P-values) and the x-axis plots the physical positionof the SNPs sorted by chromosome and chromosome position. The mostsignificant SNP was on chromosome 18 (BIEC2-417495). No SNP remainedstatistically significant following correction for multiple-testing;

FIG. 4 is a Manhattan plot of P-value for quantitative trait GWAS usingbest race distance as phenotype. The y-axis plots -log₁₀(P-values) andthe x-axis plots the physical position of the SNPs sorted by chromosomeand chromosome position. A peak of association on chromosome 18(chr18:65809482-67545806) encompassed a ˜1.7 Mb region (shown in FIG.5). Seven of the chromosome 18 SNPs remained significant followingcorrection for multiple testing. The most significant SNP wasBIEC2-417495 (P_(Bonf.)=6.58×10⁻⁵);

FIG. 5 is a regional plot for the 1.8 Mb peak of association onchromosome 18 containing the MSTN and NAB I genes. Association plot ofthe 1.8 Mb region encompassing 40 SNPs (diamonds) and the Ins227bppolymorphism (circle) ranging from one SNP upstream and one SNPdownstream of the seven SNPs significantly associated with optimumracing distance following correction for multiple testing. The y-axesplot -log₁₀(P-values) for each SNP (diamonds) and r² (blue line (solidline)) between g.664937370T and all other SNPs. The x-axis plots thephysical position of each SNP in the region. The best SNP,g.66493737C>T, is indicated with a blue diamond (indicated with B). EachSNP is color coded according to the strength of LD with g.66493737C>T:r²≧0.8, red (indicated with R); r²≧0.5<0.8, orange (indicated with O);r²≧0.2<0.5, yellow (indicated with Y); r²<0.2, white (indicated with W);

FIG. 6 is a visual representation of haplotype blocks across a 1.7 Mbregion on chromosome 18. The g.66493737 C>T SNP was included in block 3,BIEC2-417495 was included in block 6; and

FIGS. 7A-C are visual representations of haplotype blocks across a 1.7Mb region on chromosome 18 generated from samples that are C/C (torepresent C-chromosomes), T/T (to represent T-chromosomes) and ALL (i.e.reconstructed from genotypes for C/C, C/T and T/T individuals).Recombinant events are shown in FIG. 7D.

DETAILED DESCRIPTION

Intense selection for elite racing performance in the Thoroughbred horse(Equus caballus) has resulted in a number of adaptive physiologicalphenotypes relevant to exercise, however the underlying molecularmechanisms responsible for these characteristics are not wellunderstood. Thoroughbred horses have been selected for structural andfunctional variation contributing to speed and stamina during the threecentury development of the breed. The International Federation ofHorseracing Authorities recognizes five distance categories: Sprint(5-6.5 furlongs [f], ≦1,300 m), Mile (6.51-9.49 f, 1,301-1,900 m),Intermediate (9.5-10.5 f, 1,901-2,112 m), Long (10.51-13.5 f,2,114-2,716 m) and Extended (>13.51 f, >2,717 m) races(www.horseracingintfed.com) and it is widely recognized among horsebreeders that variation in physical and physiological characteristicsare responsible for variation in individual aptitude for race distance(Willett 1981). Although environment and training may contribute to therace distance for which a horse is best suited, the genetic contributionto the ability to perform optimally at certain distances is large; theheritability of best distance among Australian racehorses has beenestimated as 0.94±0.03 (Williamson & Beilharz 1998). A principalcharacteristic contributing to the ability of a Thoroughbred to performwell in short distance, sprint races is the extent and maturity of theskeletal musculature. Sprinters are generally shorter, stockier animalswith greater muscte mass than animals suited to endurance performance,and generally mature earlier. Performance aptitude for speed and staminahas also been associated with muscle fibre type phenotypes (Rivero et al1993; Barrey et al 1999) and metabolic adaptations to training (Rivero &Piercy 2008). Variation in cardiovascular function contributing toaerobic capacity may also play a role in distinguishing individualssuited to shorter or longer distance races.

We have previously reported a sequence polymorphism (g.66493737C<T) inthe equine myostatin (MSTN) gene strongly associated (P=4.85×10⁻⁸) withoptimum racing distance in Thoroughbred racehorses (Hill et al. 2010,the entire contents of which is incorporated herein by reference). Inseveral mammalian species, including cattle, sheep, dogs and horses,muscle hypertrophy phenotypes are associated with sequence variants inthe MSTN gene (Grobet et al. 1997; McPherron et al 1997; McPherron & Lee1997; Schuelke et al 2004; Mosher et al 2007).

Among horses that compete preferably in short distance (≦7 f) racesrequiring exceptional speed, the C allele OF G.66493737 C>T is twice ascommon than among horses that perform optimally in longer distance (>8f) races that require more stamina (0.72 and 0:36 respectively). Onaverage the optimum racing distance for C:C horses was 6.2±0.8 f, forC:T horses was 9.1±2.4 f and for T:T horses was 10.5±2.7 f. Furthermore,C:C horses have significantly greater muscle mass than T:T horses attwo-years-old.

Skeletal muscle phenotypes clearly play a role in distinguishingdistance aptitude, and there is a strong effect of MSTN genotype ondistance (Hill et ah 2010, the entire contents of which is incorporatedherein by reference. However, heretofore, the effects of additionalnuclear gene variants that may contribute to equine performance-relatedphenotypes have not been investigated. Therefore, we performed agenome-wide SNP-association study using the EquineSNP50 Bead Chipgenotyping array in a cohort of elite race winning Thoroughbred horses.Animals were separated into two distinct phenotypic cohorts comprisingshort distance (≦8 f) and middle-long distance (>8 f) race winners andgenetic associations were evaluated using best race distance as aquantitative phenotype. This study was designed to identify additionalgenetic loci as indicators of race distance aptitude and to establishwhether variation at the g.664937370T SNP was associated withinter-locus epistemic effects for race distance performance.

The present invention relates to a previously unknown relationshipbetween sequence variants (such as SNPs and insertion polymorphism) inthe MSTN gene and retrospective athletic performance (given asracecourse success i.e. Group winner or non-winner, handicap rating(RPR) and best race distance for Group winners) in Thoroughbred racehorses. In some aspects, the invention relates to sequence variants inthe MSTN gene and flanking sequences. In some aspects the inventionrelates to sequence variants in linkage disequilibrium with sequencevariants in the MSTN gene.

MSTN

Myostatin is also known as growth/differentiation factor 8 precursor(GDF-8). In several mammalian species (including cattle, sheep anddogs), the double muscling trait is caused by mutations in the myostatin(MSTN) gene. In dogs, MSTN gene mutations in racing whippets have beenassociated with the ‘bully’ phenotype and heterozygous individuals aresignificantly faster than individuals carrying the wild-type genotype(Mosher et al 2007). Mutations in the MSTN gene may be associated withathletic power.

We have analysed a number of polymorphisms (including SNPs and insertionpolymorphisms) in the MSTN gene for association with athleticperformance and have developed a simple DNA based method of predictingthe athletic performance potential of a subject based on the novelpolymorphisms.

A genome-wide SNP-association study for optimum racing distance wasperformed using the EquineSNP50 Bead Chip genotyping array in a cohortof n=118 elite Thoroughbred racehorses divergent for race distanceaptitude. In a cohort-based association test we evaluated genotypicvariation at 40,977 SNPs between horses suited to short distance (≦8 f)and middle-long distance (>8 f) races. The most significant SNP waslocated on chromosome 18; BIEC2-417495 ˜690 kb from the gene encodingmyostatin (MSTN) [P_(unadj)=6.96=10⁻⁶]. Considering best race distanceas a quantitative phenotype, a peak of association on chromosome 18;(chr18:65809482-67545806) comprising eight SNPs encompassing a 1.7 Mbregion was observed. Again, similar to the cohort-based analysis, themost significant SNP was BIEC2-4I7495 (P_(unadj.)=1.61×10⁻⁹;P_(Bonf.)=6.58×10⁻⁵). In a candidate gene study we have previouslyreported a SNP (g.66493737C>T) in MSTN associated with best racedistance in Thoroughbreds; however, its functional and genome-widerelevance were uncertain. Additional re-sequencing in the flankingregions of the MSTN gene revealed four novel 3′ UTR SNPs and a 227 bpSINE insertion polymorphism in the 5′ UTR promoter sequence. Linkagedisequilibrium was highest between g.66493737C>T and BIEC2-417495(r²=0.86).

Comparative association tests consistently demonstrated theg.66493737C>T SNP as the superior variant in the prediction of distanceaptitude in racehorses (g.66493737C>T, P=1.02×10⁻¹⁰; BIEC2-417495,P_(unadj.)=1.61×10⁻⁹). Functional investigations will be required todetermine whether this polymorphism affects putativetranscription-factor binding and gives rise to variation in gene andprotein expression. Nonetheless, these data demonstrate that theg.66493737C>T SNP provides the most powerful genetic marker forprediction of race distance aptitude in Thoroughbreds.

The invention will be more clearly understood from the followingexamples.

Examples Materials and Methods Subjects

A Thoroughbred is a registered racehorse that can trace its ancestry toone of three foundation stallions and the approximately 30 foundationmares entered in The General Studbook, 1791 (Weatherby and Sons 1791).There are two types of Thoroughbred race: National Hunt races are runover hurdles or steeplechase fences over distances of up to 4.5 miles(7,200 m), while Flat races have no obstacles and are run over distancesranging from five furlongs (⅝ mile or 1,006 m) to 20 furlongs (4,024 m).The highest standard and most valuable elite Flat races are known asGroup (Europe and Australasia) or Stakes races (North America). The mostprestigious of these races include The Breeders' Gup races (UnitedStates), The Kentucky Derby (United States), The Epsom Derby (UnitedKingdom) et cetera.

Three hundred and fifty Group races are run in Europe (Britain, Ireland(incl. Northern Ireland), France, Germany, Italy) annually including 84Group 1, 93 Group 2 and 173 Group 3 races. In the United Kingdom andIreland 196 Group races are competed annually (43 Group 1, 50 Group 2and 103 Group 3). Britain has the highest number of Group races (139) inEurope per annum, with 57% run over distances ≦1 mile (1609 meters) and43% run over distances >1 mile. Australia has approximately 540-550Group races per season from a total of almost 21,000 races and NewZealand hosts 78 Group races per season. After Group races, Listed racesare the next highest grade of race.

Horses that compete over distances <1 mile are known as ‘sprinters’whereas horses that compete over distances >1 mile are known as‘stayers’. Horses competing in 1 mile races (‘milers’ and ‘middledistance’) may be considered either sprinters or stayers and the way inwhich a race is executed by the rider often reflects the trainersperceived ability (‘sprinter’ or ‘stayer’) of the horse. TheInternational Federation of Horseracing Authorities recognizes five racedistance categories: Sprint (5-6.5 f, ≦1,300 m), Mile (6.51-9.49 f,1,301-1,900 m), Intermediate (9.5-10.5 f, 1,901-2,112 m), Long(10.51-13.5 f, 2,114-2,716 m) and Extended (>13.51 f,>2,717 m);S-M-I-L-E [Note: 1 furlong=⅛ mile=201.2 meters].

A repository of registered Thoroughbred horse blood or hair samples(n>1,400) was collected from stud farms, racing yards and salesestablishments in Ireland, Great Britain and New Zealand during 1997 to2008. Each sample was categorized based on retrospective racecourseperformance records. Only horses with performance records in Flat raceswere included in the study. The study cohort comprised eliteThoroughbreds that had won at least one Group race (Group 1, Group 2 orGroup 3) or a Listed race-the highest standard and most valuable eliteFlat races are known as Group (Stakes) races and Listed races are thenext in status. Only elite race winning horses were included as eliteraces are most likely to reflect the truest test for distance. Racerecords were derived from three sources [Europe race records: The RacingPost on-line database (www.racingpost.co.uk); Australasia and South EastAsia race records: Arion Pedigrees (www.arion.co.nz); North America racerecords: Pedigree Online Thoroughbred database (www.pedigreequery.com)].

Each sample was assigned a best race distance which was defined as thedistance (furlongs, f) of the highest grade of race won [note: 1furlong=⅛ mile=201.2 meters]. When multiple races of the same grade werewon, then the distance of the most valuable race, in terms of prizemoney, was used. A set of elite Thoroughbred samples (n=118) wasselected from the repository, mostly comprising samples procured inIreland and Great Britain (i.e. n=5 samples [n=3≦8 f, n=2>8 f] werecollected in New Zealand); though some had won their best race in NorthAmerica. Animals with excessive consanguinity (within two generations)were avoided and over-representation of popular sires within thepedigrees was minimized as far as possible. One hundred and seven sireswere represented in the total sample set. For the case-controlinvestigation we compared two cohorts: samples were subdivided intoshort (≦8 f, n=68) and middle-long (>8 f, n=50) distance elite racewinning cohorts (Table 1 below).

TABLE 1 Description of phenotype cohorts No. Mean Range Mean Range Nsires RPR RPR BRD BRD All TBs 118 107 116  84-138 8.6 5-16 Short (≦8 f)68 63 114  84-129 6.8 5-8  Middle-long (>8 f) 50 48 120 107-138 11.39-16

All TBs (Thoroughbreds) were used for the quantitative association testanalysis. Racing Post Ratings (RPR) represent handicap ratings (bestlifetime RPR) that are indicative of performance ability. Best racedistance (BRD) was the distance (f) of the highest grade of race (Group1,2,3, Listed) won.

DNA extraction Genomic DNA was extracted from either fresh whole bloodor hair samples using a modified version of a standard phenol/chloroformmethod (Sambrook & Russell 2001) or the Maxwell 16 automated DNApurification system (Promega, Wis., USA). DNA samples were quantifiedusing Quant-iT PicoGreen dsDNA kits (Invitrogen, Carlsbad, Calif.)according to the manufactures instructions and the DNA concentrationswere adjusted to 20 ng/μl.

Detection of Polymorphism

The sequence variant may be determined by any genotyping methodincluding for example the following non limiting methods: direct DNAsequencing; allele size discrimination using gel based assays;single-strand conformation polymorphisms; high-resolution melting of PCRamplicons; matrix-assisted laser-desorption-ionization massspectrometry.

Genotyping and Quality Control

Samples were genotyped using EquineSNP50 Genotyping BeadChips (Illumina,San Diego, Calif.). This array contains approximately 54,000 SNPsascertained from the EquCab2 SNP database of the horse genome (Wade etal. 2009) and has an average density of one SNP per 43.2 kb. Genotypingwas performed by AROS Applied Biotechnology AS, Denmark. The samplesthat were genotyped for this study were a subset of n=187 samplesgenotyped in two separate batches (Batch 1, n=96; Batch 2, n=91). Weincluded: four pairs of duplicate samples in Batch 2 for QC purposes andobserved greater than 99.9% concordance in the four pairs. In total, wesuccessfully genotyped 53,795 loci. All samples had a genotyping rate ofgreater than 90%. We omitted SNPs which had a genotyping completion rateof less than 90%, were monomorphic or had minor allele frequencies (MAF)less than 5% in our samples from further analysis. We omitted 12,818SNPs leaving 40,977 SNPs in our working build of the data and theoverall genotype completion rate was 99.8%.

Re-sequencing MSTN Flanking Sequences

PCR primers were designed to cover −2 kb of the 5′UTR and ˜2 kb of the3′ UTR of MSTN genomic sequence using the PCR Suite extension to thePrimcr3 web-based primer design tool (Rozeri & Skaletsky 2000; van Baren& Heutink 2004) (Table 2 below). Fifteen unrelated Thoroughbred DNAsamples (g.66493737C>T, n=5 C:C; n=5 C:T, n=5 T:T) were included in are-sequencing panel to identify novel sequence variants. BidirectionalDNA sequencing, of PCR products was performed by Macrogen Inc. (Seoul,Korea) using AB 3730×1 sequencers (Applied Biosystems, Foster City,Calif.). Sequence variants were detected by visual examination ofsequences following alignment using Consed version 19.0 (Gordon et al.1998).

TABLE 2 PCR and sequencing primers forre-sequencing MSTN flanking sequences Oligonucleotide SEQOligonucleotide Primer Sequence ID Name 5′-3′ No StructureForward and reverse primers for MSTN 3′ UTR PCR and sequencingPCR Primer 3′ UTR TACTCCCACAAAGAT  1 (Forward) GTCTCCAAT PCR Primer 3′UTR TGAATCACCTCCTGC  2 (Reverse) ATTAGACT Sequencing Primer 1GAATGGCTGATGTCA  3 3′ UTR (Forward) TCAGG Sequencing Primer 1CCTGATGACATCAGC  4 3′ UTR (Reverse) CATTC Sequencing Primer 2CAAATCTCAACGTTC  5 3′ UTR (Forward) CATTG Sequencing Primer 2CAATGGAACGTTGAG  6 3′ UTR (Reverse) ATTTGForward and reverse primers for MSTN 5′ UTR PCR and sequencingPCR Primer 5′ UTR CTGGTTTGTGTCTGG  7 (Forward) TTTTC PCR Primer 5′ UTRCTTTTCCTTCCTGCT  8 (Reverse) TACATAC Sequencing Primer 1 AACAAAACAAACAGG 9 5′ 5′ UTR (Forward) CACCC upstream Sequencing Primer 1GGGTGCCTGTTTGTT 10 5′ 5′ UTR (Reverse) TTGTT upstreamSequencing Primer 2 GTCAGGAAAACAAGT 11 5′ 5′ UTR (Forward) TTCTCAAAupstream Sequencing Primer 2 TTTGAGAAACTTGTT 12 5′ 5′ UTR (Reverse)TTCCTGAC upstream Sequencing Primer 3 GACAGCGAGATTCAT 13 5′ 5′UTR (Forward) TGTGG upstream + part Exon 1 Sequencing Primer 3CCACAATGAATCTCG 14 5′ 5′ UTR (Reverse) CTGTC upstream + part Exon 1Sequencing Primer 4 CCTGTTTGTGCTGAT 15 5′ 5′ UTR (Forward) TCTTGupstream + part Exon 1 Sequencing Primer 4 CAAGAATCAGCACAA 16 5′ 5′UTR (Reverse) ACAGG upstream + part Exon 1

Bioinformatics

The software tool MatInspector (Cartharius et al. 2005) was used tosearch for transcription factor binding site consensus sequences presentin 300 bp of the MSTN 5′ UTR region in which a novel SINE insertion(Ins227bp) polymorphism was detected. To investigate possible microRNA(miRNA) regulation of MSTN gene expression we screened the equine MSTNgene and flanking sequences for putative miRNA binding sites. A list of407 predicted equine miRNAs (Zhou et al. 2009) were inputted into theonline tool DIANA microtest(http://diana.pcbi.upenn.edu/cgi-bin/micro_t.cgi) and a 14.7 kb segmentcontaining the equine MSTN gene and ˜5 kb of upstream and downstreamsequence was inputted as the target sequence. SNPInspector (Carthariuset al 2005) was used to investigate transcription factor binding sitesat the g.664937370T locus.

Genotyping the Chr18g.66495327Ins227bp66495326 (Ins227bp) Polymorphism

A PCR-based assay for allele size discrimination was used to genotypethe Ins227bp polymorphism in n=165 samples. The following primers wereused: forward 5′-ATCAGCTGAGCCTTGACTGTAAG-3′(SEQ ID No. 17) and reverse5′-TCATCTCTCTGGACATCGTACTG-3′ (SEQ ID No. 18). Alleles were determinedas follows: Normal allele—600 bp; and Insertion227bp allele—827 bp.

Statistical Analyses

All statistical analyses, including tests of association were performedusing PLINK Version 1.05 (Purcell el al. 2007). We compared genotypefrequencies in short and middle-long distance cohorts, testing for traitassociation using ×2 tests with two degrees of freedom. To test forpopulation stratification, the pairwise identity-by-state (IBS) distancewas calculated for all individuals. A permutation test was performed toinvestigate IBS differences among the short and middle-long distancecohorts. The linear regression model was used to evaluate quantitativetrait association using best race distance (f) as the phenotype. Wereport uncorrected P-values (P_(unadj.)) and P-values followingcorrection for multiple testing using the Bonferroni method (P_(Bonf.)).Manhattan and Q-Q plots were generated in R using a modified version ofcode. The regional association plot was generated in R using a modifiedversion of code available at http://www.broadinstitute.org.

Cohort-based association (short vs middle-long distance) andquantitative trait association tests were also performed for theg.66493737C>T SNP (Hill et al. 2010) and a novel 5′UTR MSTN SINEinsertion (Ins227bp) polymorphism identified in this study. In addition,an analysis of genome-wide cpistasis was performed in which theg.66493737C>T SNP was tested against all SNPs on the EquincSNP50Gcnotyping BeadChip for epistatic interactions influencing best racedistance. This test involved a linear regression analysis to investigatewhether gene by gene interactions had a significant influence on bestrace distance. Linkage disequilibrium (LD) between g.66493737C>T andIns227bp and between g.66493737CT and all chromosome 18 SNPs on theEquineSNPSO Gcnotyping BeadChip was quantified as r². A visualrepresentation of haplotype blocks across a 1.7 Mb region on chromosome18 was generated using Haploview (FIG. 6) (Barrett et al. 2005; Barrett2009).

Ethics

This work has been approved by the University College Dublin, Ireland,Animal Research Ethics Committee.

Example 1 Genome-Wide SNP-Association Study & CandidatePerformance-Associated Genes Genome-Wide SNP-Association Study

We have previously described an association between optimum racingdistance and a SNP (g.66493737C>T) in the equine MSTN gone inThoroughbred Flat racehorses (Hill et al 2010, the entire contents ofwhich is incorporated herein by reference). Candidate gene approachesare designed considering a priori hypotheses and do not allow theopportunity for evaluation of the effect of the gene in the context ofthe entire genome, nor do they allow for the identification of othergenes contributing to the phenotype (Tabor et al. 2002; Jorgensen et al2009). Therefore, employing a hypothesis-free approach we investigatedgenome-wide influences on optimum racing distance by conducting agenome-wide SNP-association study in a cohort of elite Thoroughbredracehorses.

In a cohort-based genotype-phenotype investigation we compared twocohorts: short (≦8 f) and middle-long (>8 f) distance elite racewinners. The genome-wide association study (GWAS) results, sorted bychromosome, are shown in FIG. 3. The most significant SNP was onchromosome 18 (BIEC2-417495, P_(unadj.)=6.96×10⁻⁶) and five of the topten SNPs were located together spanning a 2.4 Mb reigion bn chromosome18 (chr18:64725066-67186093). However, no SNP in this analysis reachedgenome-wide significance following correction for multiple-testing.

The SNPs identified in chromosome 18 during the horse genome sequencingproject and those that are found on the EquineSNP50 BeadChip can beviewed athttp://www.broadinstitute.org/ftp/distribution/horse_snp_release/v2/(fileequcab2.0_chr18_snps.xls), the entire contents of which is incorporated herein by reference.Pairwise IBS values were used to investigate population stratificationbetween the short and middle-long cohorts. While on averagephenotypically concordant pairs of individuals were more similar thanphenotypically discordant pairs (P=0.034), the overall differencebetween the two groups was negligible (<0.0002).

Using the linear regression model, we considered best race distance as aquantitative phenotype and observed the same peak of association onchromosome 18 (chr18:65809482-67545806) (FIG. 4). The unadjusted and FDRcorrected P values for quantitative association test result for bestrace distance are given in additional file 1 which can be downloaded athttp://www.biomedcentral.com/1471-2164/11/552, the entire contents ofwhich are incorporated herein by reference. The top eight SNPsencompassed a 1.7 Mb region on chromosome 18 (FIG. 5) and seven reachedgenome-wide significance following correction for multiple testing(P_(Bonf.)<0.05). The most significant SNP was also the most significantin the cohort-based analysis: BIEC2-417495 (P_(unadj.)=1.61×10⁻⁹;P_(Bonf.)=6.58×10⁻⁵).

Candidate Performance-Associated Genes

We investigated candidate genes in the 1.7 Mb (Chr18:65809482-67545806)region on chromosome 18 that encompassed the seven SNPs that reachedgenome-wide significance. Eleven protein coding genes were identified,including the myostatin gene (MSTN) and the NGFI-A binding protein 1(EGR1 binding protein 1) gene (NAB1).

The genomic region on chromosome 18 containing the MSTN gene was thehighest ranked; region in the GWAS for best racing distance, reachinggenome-wide significance for a set of seven SNPs within a 1.7 Mb region.The best SNP (BIEC2-417495) and the second best SNP (BIEC2-417372) were692 kb and 28 kb from the MSTN gene, respectively. We searched theregion for other plausible candidate genes and identified the NGFI-Abinding protein 1 (EGR1 binding protein 1) gene (NABI) located −170 kbfrom BIEC2-417495. The product of the NABI gene is highly expressed incardiac muscle and has been reported to be a transcriptional regulatorof cardiac growth (Buitrago et al. 2005). Its principal role is in itsinteraction with the early growth response 1 (EGR-I) transcriptionalactivator that is involved in regulation of celluliar growth anddifferentiation (Thiel et al. 2000).

We considered NABI as a strong candidate gene to influence an athleticperformance phenotype as we have previously identified EGR-1 mRNAtranscript alterations (+1.6-fold, P=0.014) in skeletal muscleimmediately following a bout of treadmill exercise in untrainedThoroughbred horses (McGivney et al. 2009). Twelve SNPs located withinthe NAB1 genomic sequence (chr18:g.66995249-67021729) are documented inthe EquCab2 SNP database, and three are contained on the EquincSNP50Genotyping BeadChip. After correction for multiple testing, there wereno detectable associations between the three NAB1 SNPs and the trait(BIEC2-417453,P_(unadj.)=0.0007, rank 144; BIEC2-417454,P_(unadj.)=0.0012, rank 210; and BIEC2-417458, P_(unadj.)=0.0032, rank421). Therefore, we did not further consider NABI as a potential majorcontributor to variation in optimum racing distance.

Example 2 Polymorphism Detection in Equine MSTN Flanking Sequences

We have previously identified SNPs in intron 1 of the equine MSTN geneby re-sequencing the coding and intronic sequence [PCT/IE2009/000062 andHill et al 2010, the entire contents of which are incorporated herein byreference]. Details of two of the SNPs identified in Intron 1 are shownin Table 3 below.

TABLE 3 SNPs in intron 1 of the MSTN gene Location (bp) on ECA 18 Struc-SNP ID SNP (EquCab2) ture Flanking Sequence MSTN- T:C 66493737 IntronAGCTAAGCAAGTAATTA 66493737 1 GCACAAAAATTTGAATG (T/C) TTATATTCAGGCTATCTSNP CAAAAGTTAGAAAATAC TGTCTTTAGAGCCAGGC TGTCATTGTGAGCAAAATCACTAGCAATTTCTTT TATTTTGGTTCCCCAAG ATTGTTTATAAATAAGG TAAATCTACTCCAGGACTATTTGATAGCAGAGTC ATAAAGGAAAATTA [T/C]TTGGTGCATTAT AACCTGATTACTTAATAAGGAGAACAATATTTTG AAACTGTTGTGTCCTGT TTAAAGTAGATAAAGCA CTGGGTAAAGCAGGATCGCAGACACATGGCACAG AATCTTCCGTGTCATGC CTTCTCTGTGAAGGTGT CTGTCTCCCTTTCCTTGAGTGTAGTTATGAACTG ACTGCAAAAAGAATATA TG (SEQ ID No. 19) MSTN- A:C66494218 Intron AGGAGATTATTAAGCAA 66494218 1 TGTGCCTGCCTGGAAAT (A/C)GTGCACCCCGGGTGCTC SNP TCAACAATAGTACTATG GTCAAGGTGTAAGCAGGACTCTGAGCTATAACCT CTTTGATTAAAATGTTT ATTTATTAGGCATTTTA TGATAATTAGCTCATGATTATCATTATGCTATGT TTACTTCATCATTTTTC TTACTAATACATTA [A/C]ATTTTAAAAAATATTTTTCTAATCTCCAG GGGAATAACTTTCAAAA TCTAATATGTTAATTTG TGAAGAACATAAAAACACTATGAGAAATAGTTTT GAGTAACAGAAGTCATT TTGGTGTTCAGCAAATG CTCAAATGACCTAAACGTCTACAAATTTCTTCCT TCTCTATTATTAGTGAA AAAAACTTGTTATTATA A (SEQ ID No. 20)

Details of two SNPs in the genomic region on chromosome 18 containingthe MSTN gene that ranked highly in the GWAS for best racing distanceare shown in Table 4 below.

TABLE 4 SNPs from Equine SNP50 GenotypingBead Chips that ranked highly in the GWAS for best racing distance.Location (bp) on ECA 18 Struc- SNP ID SNP (EquCab2) tureFlanking Sequence BIEC2- C:T 67186093 Inter- CATAAGGTCAAATATTT 417495genic TTCCCATTTCCCTCTTT TATTAAAATACCACATT TATTTGGAAAATCATTACTCAGCTCTATTGCTTA CTAATTATTTTAAGATA GAAAAAATATTTTGTCG CAAAGAAAGATTTCAAGACATCTTTATGGCTATA TAAATATTTATGCATCT TTTTAAATACCTTGATT GATTGGTTTTAGA[C/T]TGTCTCAGATTC CATCTGATTTCTCTGCC TCCCTGATAAACCTTCT TCAATCTCTGTTCCCTGGCCTATGAAGGTCACCT TCAAAATATTATCACCT TTATGTAATGATCAGAC ACAAAGTCTAACCATCATCTAAATTATTTCAATA TGAAGCATGACTAATAA ACCAGTATGAGTAGTTT TCAAAGTGAACAGGATTT (SEQ ID No. 21) BIEC2- A:G 66539967 Inter- GCCTGGATATGAAGCCC 417372genic ATAAGAAATGTCTGGCA GTGGTCTCTTGAGATCA GAAAGAGAATGGGAGATTAGGAAGTTAGAATAGG AAGCAAGTGAGGCAGCA GGTAGYGGAGGCTAGGT GGCCCATCTGTGAGTTTTTTCCTTCTGAACTCCT TACAATTCTTTATAAAA TTCCATGAAGGCCTCAT TTCAAGATAAAGG[G/A]GAAGAAAATATT TTCTCCTAAAAAAGCTT AAACTTAATATTCTACT TCTCAAAAAAAATTCAAAGAGGCCTAATAGATTG ACTGGAACTCTAACTGA AATTTGCCTCGCTTTCC CAAATTCTTACTGGAGAAGGGCAAGGCCTCGCCC CTCTCAGAACTCTTACA TGAGATTGCTGCTTTCC TTAGTTTCTGATCACTGT (SEQ ID No. 22)

The structure of the MSTN gene is predicted as follows (Ensembl data)(Table 5)

TABLE 5 Structure of the MSTN gene Length Start bp End bp bp 5′ upstream66,495,181 Exon 1 66,494,808 66,495,180 373 Intron 1 66,492,97966,494,807 1829 Exon 2 66,492,605 66,492,978 374 Intron 2 66,490,58966,492,604 2016 Exon 3 66,490,208 66,490,588 381 3′ downstream66,490,207

We re-sequenced 2,155bp (chr18:66488052-66490207) of the 3′UTR sequenceof MSTN sequence of the equine myostatin (MSTN) gene in 15 unrelatedThoroughbred horses and identified 4 novel SNPs. (Table 6)

TABLE 6 SNPs identified in 3′ UTR sequence of MSTN Location in Location(in the Location Contig (full downstream of on ECA18 SNP length 2139 theprotein coding (EquCab2) Struc- ID SNP bp) bp region of MSTN) bp bp tureSNP1 A:C 701 595 66,489,613 3′ UTR SNP2 C:T 943 837 68,489,371 3′ UTRSNP3 A:G 954 848 66,489,360 3′ UTR SNP4 A:T 2001 1895 66,488,313 3′ UTR

Polymorphisms in the 3′ UTR of the MSTN gene have been associated withmuscle hypertrophy in sheep and are considered likely to function viacreation of de novo target sites for the microRNAs (miRNA)miR-1 andmiR-206 (Clop et al 2006). Therefore, using a set of equine miRNAs(n=407) described by Zhou and colleagues (Zhou et al 2009) weinvestigated the presence of putative miRNA binding sites within ˜5 kbupstream and downstream flanking sequences of the MSTN gene. Fiveputative miRNA binding sites were identified, though none waspolymorphic: i.e. no putative miRNA binding site was associated with anyof the eight SNP alleles.

We re-sequenced 2,151 bp (chr18:66494683-66496834) of the 5′ UTRsequence of the equine myostatin (MSTN) gene in 15 unrelatedThoroughbred horses.

Re-sequencing was performed using four internal sequencing primersfollowing PCR using the 5′ UTR PGR and sequencing primers listed inTable 2 above (SEQ ID No. 7-16).

Following re-sequencing in the 5′ UTR of the MSTN gene, we identified a227 bp insertion polymorphism atchr18:66495327-[Insertion227bp]-66495326, located 146 bp from the startof Exon 1 (Exon 1 Start: 66495180).

The insertion sequence is as follows:

(SEQ ID No. 23) GGGGCTGGCCCCGTGGCCGAGTGGTTAAGTTCGTGCGCTCCGCTGCAGGCGGCCCAGTGTTTCGTCGGTTCGAGTCCTGGGCGCGGACATGGCACTGCTCGTCGGACCACGCTGAGGCAGCGTCCCACATGCCACAACTAGAGGAACCCACAACGAAGAATACACAACTATGTACCGGGGGGCTTTGGGGAGAAAAAGGAAAATAAAATCTTTAAAAAGCCACTTGG

A BLAST search identified the insertion sequence as a horse-specificrepetitive DNA sequence element (SINE) known as ERE-1 (Sakagami et.al J.Mol. Biol. 239 (5), 731-735 (1994). Also MatInspector analysis indicatedthat the insertion may disrupt on E-box motif.

Summary of Polymorphisms in the MSTN Flaking Region

We have identified five polymorphisms in the upstream and downstreamuntranslated (UTR) regions of the MSTN gene. We have identified fournovel SNPs (i.e. not documented in the EquCab2.0 SNP database) and aninsertion polymorphism (not previously documented). Details for thesepolymorphisms are provided in the Table 7 below.

TABLE 7 Details of polymorphisms identified in the MSTN flanking region.Location (bp) on ECA18 Struc- SNP ID SNP (EquCab2) tureFlanking Sequence Inser- Inser- 66495327 5′ TTGTGACAGACAGGGTT tion tion[Insertion UTR TTAACCTCTGACAGCGA 227 bp 227 bp 227 bp] GATTCATTGTGGAGCAG66495326 GAGCCAATCATAGATCC TGACGACACTTGTCTCA TCAAAGTTGGAATATAAAAAGCCACTTGG[GGGG CTGGCCCCGTGGCCGAG TGGTTAAGTTCGTGCGC TCCGCTGCAGGCGGCCCAGTGTTTCGTCGGTTCG AGTCCTGGGCGCGGACA TGGCACTGCTCGTCGGA CCACGCTGAGGCAGCGTCCCACATGCCACAACTA GAGGAACCCACAACGAA GAATACACAACTATGTA CCGGGGGGCTTTGGGGAGAAAAAGGAAAATAAAA TCTTTAAAAAGCCACTT GG]AATACAGTATAAAA GATTCACTGGTGTGGCAAGTTGTCTCTCAGACTG TACAGGCATTAAAATTT TGCTTGGCATTGCTCAA AAGCAAAAGAAAAGTAAAAGGAAGAAATAAGAGC AAGGAAAAAG (SEQ ID No. 38) SNP1 A:C 66489613 3′TATATACCATCATTTTG UTR ATTATCCTTATACACTT GAATTTATATTGTATAATAGCATACTTGGTAAGA TGAAATTCCACAAAAAT AGGAATGGTACACCATA TGCAAGTTTCCATTCCTATTGTGATTGATACAGT ACATTAACAATCCACAC CAATGGTGCTAATACAA ATAGGCTGAATGGCTGATGTCATCAGGTTTAT [C/A]AAATAAAAACAT CCAATAAAATAATGTTT CTCCTTTCTTCAGGTGCATTTTCCAAATGGGGAA TGGATTTTCTTTAATGA AAGAAGAATCATTTTTC TAGAGGTCAGGATTTAATTCTGTAGCATACTTGG AGAAACTGCATTACCTT AAAAGGCAGCCAAAAAG TATTCATTTTTATCAAAATTTCAAAATTGCAGCC TGCTTTTGCAACATTGC AGT (SEQ ID No. 24) SNP2 C:T66489371 3′ ATCCAATAAAATAATGT UTR TTCTCCTTTCTTCAGGT GCATTTTCCAAATGGGGAATGGATTTTCTTTAAT GAAAGAAGAATCATTTT TCTAGAGGTCAGGATTT AATTCTGTAGCATACTTGGAGAAACTGCATTACC TTAAAAGGCAGCCAAAA AGTATTCATTTTTATCA AAATTTCAAAATTGCAGCCTGCTTTTGCAACATT GCAGTTTTTATGATAAA ATAATGGAAA[C/T]GA CTGATTCTGTCAATATTGTATAAAAAGACTTTGA GACAATTGCATTTATAT AATATGTATACAATATT GTTTTTGTAAATAAGCGTCTCCTTTTTTATTTAC TTTGGTATATTTTTACA GTCAGAACATTTCAAAT TAAGTATTAAGGCACAAAGACATGTCATGTATGA CAGAAAAGCAACTGCTT ATATTTCGGGGCAAATT AGCAGATTAAATAGTGGTCTTAAAACTCCATATG CTAATGGTTAGA (SEQ ID No. 25) SNP3 A:G 66489360 3′ATCCAATAAAATAATGT UTR TTCTCCTTTCTTCAGGT GCATTTTCCAAATGGGGAATGGATTTTCTTTAAT GAAAGAAGAATCATTTT TCTAGAGGTCAGGATTT AATTCTGTAGCATACTTGGAGAAACTGCATTACC TTAAAAGGCAGCCAAAA AGTATTCATTTTTATCA AAATTTCAAAATTGCAGCCTGCTTTTGCAACATT GCAGTTTTTATGATAAA ATAATGGAAATGACTGA TTCT[G/A]TCAATATTGTATAAAAAGACTTTGA GACAATTGCATTTATAT AATATGTATACAATATT GTTTTTGTAAATAAGCGTCTCCTTTTTTATTTAC TTTGGTATATTTTTACA GTCAGAACATTTCAAAT TAAGTATTAAGGCACAAAGACATGTCATGTATGA CAGAAAAGCAACTGCTT ATATTTCGGGGCAAATT AGCAGATTAAATAGTGGTCTTAAAACTCCATATG CTAATGGTTAGATGGTT ATATTACAATCATTTTA TATTTTTTTACATTATTAACATTCACTTATAGAT TC (SEQ ID No. 26) SNP4 A:T 66488313 3′TCAATTTCCAAATGCAT UTR TGCAGTTGGCAAGGGTA TATGGTCCTAGAGTTACAAGTTCTACTGAAGCCA CAGGAACACAGGGAAGC TGCATCTTTTTTTCTAG CACTTAATGATACCAGCACATTTATCTGAGCTTT GGGGGTACCAATTTTCA [A/T]ATTGAATTGAAA AATAATCATAAAGTGCCTAGAAATTCTTAAGTGC AACACTGTACATAAATG TTTTTGAAGTGAACTCT CTTCTCTACTGCTTATCAGTTTAGTAAGTTAGCT ATAAAGCAGTGACTAAG TCTATGAG (SEQ ID No. 27)

Example 3 MSTN Ins227bp Polymorphism (Chr18g.66495327Ins227bp66495326)

This insertion polymorphism is located on Chromosome 18 of Equuscabaltus at position 66495327Ins227bp66495326 reverse strand of theHorse Genome Sequence (Equus caballus Version 2.0) which can be viewedatwww.broad.mit.edu/mammals/liorse/.

The horse genome EquCab2 assembly is a Whole Genome Shotgun (WGS)assembly at 6.79× and was released in September 2007. A femaleThoroughbred named “Twilight” was selected as the representative horsefor genome sequencing. (Wade C. M., el, al Science 326, 865-7).

The project coordination and genome sequencing and assembly is providedby the Broad Institute. The N50 size is the length such that 50% of theassembled genome lies in blocks of the N50 size or longer. The N50 sizeof the contigs is 112.38 kb, and the total length of all contigs is 2.43Gb. When the gaps between contigs in scaffolds are included, the totalspan of the assembly is 2.68 Gb. The horse EquCab2 was annotated using astandard Ensembl mammalian pipeline. Predictions from vertebrate mammalsas well as horse proteins have been given priority over predictions fromnon-vertebrate mammals. The set of predictions was been compared to 1:1homologues genes in human and mouse, and missing homologs in the horseannotation have been recovered using exonerate. Horse and human cDNAshave been used to add UTRs to protein based predictions. The finalgerie-set comprises 20,737 protein-coding genes, 2,863 identified aspseudogenes and 1,580 classified as retro-transposed genes.

Further details of the Ins227bp structural polymorphism are as follows:

-   -   Polymorphism: 66495327Ins227bp66495326    -   EquCab2.0 SNP_ID: not detected in EquCab2.0 database. No report        of insertion in on-line bioinformatics resources,    -   Genomic location of polymorphism: 5′UTR    -   Polymorphism type: Insertion        PCR Gel-Based Assay A PCR-based assay for allele size        discrimination may be designed using the following primers:

MI_F (SEQ ID No. 17) ATCAGCTCACCCTTGACTGTAAC MI_R (SEQ ID No. 18)TCATCTCTCTGGACATCGTACTG

Normal allele—Product Size 600 bp

Insertion227bp allele—Product size 827 bp

Example 4 Polymorphisms in Linkage Disequilibrium with MSTN-66493737 SNP3′ UTR MSTN SNPs

Four SNPs in the 3′UTR of MSTN (SNPs 1 to 4—see Tables 6 and 7 above)are in linkage disequilibrium with MSTN 66493737 (T/G) and may be usedas alternative predictive tests for racing performance, either alone orin combination with MSTN-66493737 and/or other polymorphisms.

Ins227bp Polymorphism

Pairwise tests of linkage disequilibrium (LD) were performed betweenMSTN-g.66493737C/T and Ins227bp.

The LD between MSTN-66493737 and Ins227bp was r²=0.73

In the example below, with one exception (Sample 12) the Ins227bppolymorphism was in complete linkage disequilibrium with the C-allele atMSTN_66493737 (T/C). Sample 12 may represent the result of arecombination event (evidence from heterozygous state at SNP2).

TABLE 8 Linkage disequilibrium of 3′UTR SNPs, and MSTN-66493737(T/C)Sample MSTN_66493737 SNP1 SNP3 ID (T/C). (Real) SNP2 (Real) SNP4Insertion  7 C:C C:C T:T A:A A:A Insertion 227 bp/ Insertion 227 bp  8C:C C:C T:T A:A A:A Insertion 227 bp/ Insertion 227 bp  9 C:C C:C T:TA:A A:A Insertion 227 bp/ Insertion 227 bp 11 C:C C:C T:T A:A A:AInsertion 227 bp/ Insertion 227 bp 12 C:C C:C C:T A:A A:A Insertion 227bp/ Normal  3 C:T A:C T:T A:G A:A Insertion 227 bp/ Normal  4 C:T A:CT:T A:G A:A Insertion 227 bp/ Normal 10 C:T A:C T:T A:G A:A Insertion227 bp/ Normal 13 C:T A:C T:T A:G A:A Insertion 227 bp/ Normal 14 C:TA:C T:T A:G A:A Insertion 227 bp/ Normal  2 T:T A:A T:T G:G A:A Normal/Normal  5 T:T A:A T:T G:G A:A Normal/ Normal  6 T:T A:A T:T G:G A:ANormal/ Normal 15 T:T A:A T:T G:G A:A Normal/ Normal  1 T:T A:C T:T A:GA:T Normal/ Normal

In 14 of the 15 sequenced samples, the Ins227bp allele was inconcordance with the C-allele at g.66493737C>T. As complete concordancewas not observed, we genotyped a set of n=165 samples to determine theextent of concordance between the Ins227bp and g.66493737C>Tpolymorphisms. We performed parallel association tests for the same setof samples to evaluate the relative performance of the two polymorphismsas predictors of optimum racing distance. The g.66493737C>T SNPperformed better in an association test with best race distance(P=5.24×10⁻¹³) than the Ins227bp polymorphism (P=5.54×10⁻¹⁰). Analysisof the sequence surrounding g.66493737C>T indicated that alternatealleles may result in the gain of a putative Homeobox C8/Hox-3alphatranscription factor binding site and/or the disruption of putativeDistal-less homeobox 3, E2F and Pd×1 transcription factor binding sites.

Chromosome 18 SNPs

Pairwise tests of linkage disequilibrium (LD) were performed betweeng.66493737C>T and the 1,373 chromosome 18 SNPs represented on thegenotyping array (Equine SNP50 genotyping BeadChips). LD was highestbetween g.66493737C>T and BIEC2-417495 (r²=0.86). Seven discretehaplotype blocks were identified in the 1.7 Mb peak of association onchromosome 18. The g.66493737C>T SNP was included in block 3;BIEC2-417495 was included in block 6 (FIG. 6).

SUMMARY

We focused on comprehensively evaluating variation in the MSTN gene byre-sequencing ˜2 kb of the 3′ and 5′ flanking sequences. Four novel 3′UTR SNPs and a 227 bp SINE insertion (Ins227bp) polymorphism located 146bp upstream of the coding region start site were identified (see Example2 above). We investigated whether the 3′ UTR SNPs may abrogate existingor create de novo putative miRNA binding sites, as has been describedfor MSTN influenced phenotypic variation in Texel sheep (Clop et al2006). However, there was no evidence for alterations in putative miRNAbinding sites. Next, because of the close proximity to thetranscriptional start site, we considered the Ins227bp polymorphism as astrong functional candidate contributing to variation in racingperformance. However, a comparative evaluation of association using thesame set of samples (n=165) demonstrated that the g.66493737C>T SNPdisplayed a stronger association (P=5.24×10⁻¹³) with best race distancethan the Ins227bp polymorphism (P=5.54×10⁻¹⁰).

An evaluation of LD showed that the strongest association was betweeng.66493737C>T and the most significant SNP in the GWAS study,BIEC2-417495. A comparison of trait association in the same set ofsamples (n=118) confirmed the superior power of the g.66493737C>T SNP(P=1.02×10⁻¹⁰) in the prediction of best race distance when comparedwith BIEC2-417495 (P_(unadj.)=1.61×10⁻⁹). The significance values andgenotype frequencies for the top SNPs in the GWAS and the g.66493737C>TSNP are shown in Table 9. In addition, we investigated whetherg.66493737C>T may interact with other SNPs represented on theEquincSNP50 genotyping array; however, no significant interaction wasobserved to influence best race distance (P>0.0001 for allinteractions). Therefore, the effect of genotype on racing phenotype ishighly likely a result of the previously reported variation in the MSTNgene at locus g.66493737C>T.

TABLE 2 Significance values (unadjusted and Bonferroni corrected Pvalues) for the top SNPs associated with optimum race distance. CHR SNPUNADJ P BONF. P A1 A2 A11 A12 A22 18 g.66493737C > T 1.02E−10 N/A T C0.1538 0.5962 0.2500 18 BIEC2-417495 1.61E−09 6.58E−05 T C 0.1709 0.59830.2308 18 BIEC2-417423 3.55E−08 0.001454 G A 0.1017 0.5169 0.3814 18BIEC2-417372 6.21E−08 0.002545 G A 0.0932 0.5424 0.3644 18 BIEC2-4172748.08E−08 0.003312 T G 0.1864 0.6017 0.2119 18 BIEC2-417210 3.13E−070.01281 C T 0.2119 0.5763 0.2119 18 BIEC2-417524 4.87E−07 0.01995 G A0.1186 0.5763 0.3051 18 BIEC2-417507 5.09E−07 0.02086 C A 0.1368 0.58970.2735

A11: genotype frequency for homozygotes (allele 1) in the population(n=118); A12: genotype frequency for heterozygotes; A22 genotypefrequency for homozygotes (allele 2). Correction for multiple testingwas not applied for g.66493737C>T; however, the association remainsstronger (P_(Bonf.)=4.18×10⁻⁶) after application of a correction factor.

It is important to note that the sample size used for the present studyis relatively small. However, the results of the quantitative trait GWASdemonstrate that the sample size used was sufficient to detect a majorgenetic effect such as that manifested at the MSTN locus. A lower samplesize requirement for GWAS in the Thoroughbred is supported by populationgenomics analyses of this population in comparison to other horsebreeds. These demonstrate that the extent of LD in the Thoroughbred issignificantly greater than that measured in other horse populations,being comparable to LD estimates in inbred dog breeds (Wade et al2009).The high LD in Thoroughbreds is a reflection of low effective,population size, which enables detection of associations with smallersample sizes.

Example 5 Haplotype Analysis in the Region of MSTN

Genotypes for a subset of n=182 (C/C n=102, T/T n=80) horses wereextracted from data generated for a sample of n=368 Thoroughbred DNAsamples genotyped using EquineSNP50 Genotyping BeadChips (Illumina, SanDiego, Calif.). DNA was quantified using Quant-iT PicoGreen dsDNA kits(Invitrogen, Carlsbad, Calif.) according to the manufacturer'sinstructions and the DNA concentrations were adjusted to 20 ng/μl. TheEquineSNP50 Genotyping BeadChip (Illumina, San Diego, Calif.) containsapproximately 54,000 SNPs ascertained from the EquCab2 SNP database ofthe horse genome and has an average spacing of 43.2 kb between adjacentvariants. Genotyping was performed by laboratories at AROS AppliedBiotechnology, Denmark and GeneSeek, USA. The samples genotyped for thepresent study were a subset of samples genotyped in three separatebatches (Batch 1, n=96; Batch 2, n=92; Batch 3, n=228). We included fourpairs of duplicate samples between Batch 1 and Batch 2, two additionalpairs of duplicate samples between Batch 2 and Batch 3 and two pairs ofduplicate samples within Batch 3 for QG purposes and observed greaterthan 99.9% concordance in seven of the eight pairs. A parent offspringtrio was also included to verify Mendclian transmission of SNPs. Wesuccessfully genotyped 53,922 loci. All samples had a genotypingrate >90%. We omitted SNPs which had a genotyping completion rate <90%were monomorphic or had minor allele frequencies (MAF) <5% in oursamples. We omitted 18,109 SNPs leaving 35,813 SNPs in our working buildof the data and the overall genotype completion rate was 99.9%.

SNPs spanning a 1.7 Mb region on ECA18 containing the MSTN gene wereextracted from the data. Haploview was used to calculate pairwisemeasures of LD among the 47 SNPs and was employed to create a visualrepresentation of the data. Using the default method, the region wasdivided into blocks of strong LD using a standard block definition(Gabriel et al., 2002) based On confidence intervals for strong LD andminor allele frequencies>0.05.

We re-constructed haplotypes in n=204 C-chromosomes and n=160T-chromosomes in C/C and T/T Thoroughbreds only, for 46 SNPs(BIEC2-417187- BIEC2-417520) extracted from the Equine SNP50BeadChip andthe MS77Vg.66493737C/T variant. The 47 SNP-haplotypes (FIG. 7) spannedthe 1.7 Mb region at the MSTN gene locus that contained a set of eightSNPs with genome-wide significance of association with best racedistance in a previous GWAS (Hill et al 2010, the entire contents ofwhich is included herein by reference). The C-allele was observed on asingle haplotypic background spanning 273 kb (i.e. no variation wasdetected between BIEC2-417333-BIEC-417372), and only minimal variationwas detected in a single proximal region (Block 1) located 439 kbupstream of the MSTN g.66493737C/T locus. This indicates haplotypeconservation between the Ins227bp and g.66493737C/T polymorphisms ong.66493737C-chromosomes. In contrast, the T-allele arises on a complexgenetic background, with multiple haplotype blocks across the region,and considerable variation (FIG. 7) within the haplotype block (Block 4,spanning 484 kb) containing the MSTN g.66493737C/T SNP (see FIG. 7).These data are consistent with a single introduction of the C-allele atthe foundation stages of the Thoroughbred.

Further haplotype analysis detected no background variation (MAF >0.05)on Ochromosomes (i.e. g.66493737C) between BIEC2-417333 andBIEC2-417372. i.e. an invariable 273096 kb haplotype block, containingboth the Ins227bp polymorphism and g.66493737G>T SNP.

Example 6 Assays for Predicting the Athletic Performance Potential of aSubject

The test for speed/stamina described in PCT/IE2009/000062, the entirecontents of which is incorporated herein by reference, may be designedalternatively using an assay for any genetic variants in linkagedisequilibrium with locus MSTN_66493737 (T/C). For example, the Ins227bppolymorphism or BIEC2-417495. Alternatively, an assay for predicting theathletic performance potential of a subject may be based on acombination of more than one polymorphism.

Validation of a test for association may be performed by genotyping 192samples for validation of linkage between Ins227bp and MSTN_66493737(T/C) and association with retrospective racing performance traits (e.g.Best race distance). The Ins227bp genotypes will similarly be predictiveof best race, distance and may correlate with predictions based on theMSTN_66493737(T/C) SNP. Examples of prediction of phenotypes are givenin FIGS. 1 and 2.

The Ins227bp polymorphism is located 1590 bp from the g.66493737C>T SNP.

The greater association between g.66493737C>T and best race distancethan the Ins227bp polymorphism does not preclude the Ins227bppolymorphism being the functional variant. Functional studies will needto be performed to determine the functional variant.

Notwithstanding this, both/either of these polymorphisms may be used topredict best race distance.

Thoroughbred horses excel in both sprint (<1,500 m ) and longer distance(>1,800 m) races; Horses competing in middle distance races (‘milers’and ‘middle distance’) may be considered either ‘sprinters’ or ‘stayers’and the way in which a race is executed by the rider often reflects thetrainer's perceived sprinting and endurance ability of the horse. Withinthe industry horses may be described as sprinters based on theirconformation and usually have a stockier and more muscular stature andare faster maturing. They usually race as 2 year olds and over shorter,distances as 3 year olds. Individuals perceived to be longer distanceanimals may be referred to as ‘backward’ requiring more time to matureand running over longer distances as 3 year olds. In some regions (e.g.Australia) breeders attempt to breed only faster ‘sprint’ type horses.For example, in the USA Group 1 races >10 f are limited (9% USA, 23%Australia, 28% Britain,) and in Australia 37% of Group 1 races arecompeted over distances 5-7 f compared to 20%.in USA and just 12% inBritain. These selection pressures favour C-alleles, which is reflectedin the distribution of genotypes among a sample of elite mares andstallions sampled in Australia (n=43; C/C, 0.41; C/T, 0.47; T/T, 0.12).

In some aspects, the invention provides a simple DNA based method(genotype test) for predicting the elite sprint race performance of athoroughbred race horse based on the presence or absence of a SNP orother structural DNA variant (e.g. insertion polymorphism) in one ormore exercise response gene. For example the genotype test may be basedon a SNP or insertion polymorphism in the MSTN gene and flankingsequences. Details of the SNPs and insertion polymorphism that may beused to predict the elite sprint race performance of a thoroughbred racehorse are given in the appendices. It will be appreciated that thegenotypic test may be based on a combination of any one or more of thesepolymorphisms.

Applications of the Assay

Considering the association between DNA variants such asChr18g.66495327Ins227bp66495326 and MSTN 66493737 (T/C) the test may beapplied in practice in the following ways:

1. Young stock (foals and yearlings)Informed selection and sales decisions can be made to:

-   -   identify sprinters    -   identify middle-distance/ potential Derby winners with speed    -   identify individuals with enhanced stamina        2. Horses-in-training        Operating costs can be reduced and racing strategy can be fine        tuned by:    -   identifying the most precocious two-year olds    -   horses can be trained and raced for optimal racing distance

3. Broodmares

Breeding outcomes can be optimised by:

-   -   focusing on optimal breeding mares    -   selecting compatible stallions

4. Stallions

A stallions potential can be promoted by:

-   -   predicting stamina index for young stallions (5 year advantage)    -   attracting compatible mares to enhance stallion profile

For example, for the Ins227bp polymorphism for foals, young stock andhorses-in-training selection of individuals may be made for individualsmost likely to perform well as two year olds (Ins227bp/Ins227bp andIns227bp/Normal) and against ‘backward’ individuals (industryterminology for less physically developed young Thoroughbreds) that maybenefit from waiting to race until they are three years old(Normal/Normal). Breeding objectives may be more confidently met byselecting Ins227bp/Ins227bp individuals for short distance racing,Ins227bp/Normal individuals for middle-distance racing and Normal/Normalindividuals for racing requiring greater stamina. For stallion owners,prediction of a stallion's genetic stamina index at the outset of a studcareer (five years are required to estimate S.I. from retrospectivethree year old progeny racing performance) will immediately enhance ayoung stallion's profile and promote their genetic potential to mareowners. This in turn will enable mare owners, with targeted breedingstrategies, to better ;select stallions to achieve specific breedingobjectives. To eliminate uncertainty from a mating outcome (unless bothsire and dam arc homozygous) it will be necessary to genotype the foal,enabling selection of individuals for a targeted breeding outcome.

Example 7 Application of the Assay to Determine Speed Measured by GPS

We hypothesized that speed parameters measured using field technologies(GPS) in a cohort of horses-in-training may be influenced byg.66493737C>T genotypes at the myostatin locus.

Study Animals and Training Protocol

A subset of horses (n=85) from a group of Thoroughbred Flat racehorses(n=102) previously evaluated from a single training stable forphysiological performance parameters during training (March-November) in2007 and 2008 (Fonseca et al., 2010) were included in the current study.The horses included were chosen based on their training stage andfitness in order to make up the most homogeneous group. The study cohortcomprised of 55 two-year-olds (18 males and 37 females) and 30three-year-olds (11 males and 19 females). The criteria for inclusion inthe study cohort were each horse must have completed at least 2 WDsprior to the GPS recording (i.2. GPS recordings were taken for ≧3 accWD)and had for which satisfactory GPS and HR recordings for work days (WD,an exercise workout which simulates a race) in the training period(March to November 2007 or March to November 2008); the GPS and HR dataassociated with the greatest number of accumulated WD (accWD) for eachhorse was used.

The training protocol for the horses has been described previously(Fonseca et al., 2010). Briefly, horses were trained six days per weekon an outdoor all-weather gallop 1,500 m in length with a 2.7 % inclinefor the final 800 m. The training program consisted of progressivestages gradually introducing ‘fast’ workouts (WD) as trainingprogressed. WD generally consisted of gallop distances 800-1,000 m.Training was modified and adapted to each individual animal based onsoundness, fitness and aptitude. Following the onset of WD, horses wereentered into competitive races dependant on their perceived fitness andperformance. All decisions on the training and racing schedule were madeby a single trainer.

Experimental Protocol and Data Collection

Measured data were only recorded for horses undergoing a WD which hasbeen previously described (Fonseca et al. 2010); Each jockey carried ahand-sized GPS unit (GPSports Systems SPI10). After data collection andat the end of each day the GPS data were downloaded to anequine-specific software programme (Race watch Software, GPSportsSystems SPI10). The GPS unit recorded variables including, speed, timeand distance as well as the exact map of each horse's exercise. Prior tothe onset of the study, the entire gallop had been pre-recorded usingone of the GPS units as previously described (Fonseca et al. 2010).

Phenotypes

All speed measurements were recorded from a distance of 800 m from thefinish line as the total distance exercised on a WD differed slightlyfor each horse. Speed indices evaluated were based on previous work byFonseca et al. (2010) and included maximal velocity (V_(max)), durationat V_(max) (V_(maxt)), distance (m) travelled during six seconds beforeVroax (V_(maxD6b)), distance (m) travelled during six seconds after Vmax(V_(maxD6a)) and distance (m) travelled during six seconds before andafter Vmax (V_(maxD6)).

DNA Extraction and MSTN Polymorphism Genotyping

Genomic DNA was extracted from fresh whole blood using the Maxwell 16automated DNA purification system (Promcga, Wis., USA). Genotyping wascarried out using Taqman chemistry on the StepOnePlus™ Real-Time PCRSystem (Applied Biosystems, Calif., USA). The assay consisted of forwardprimer 5′- CCAGGACTATTTGATAGCAGAGTCA (SEQ ID No. 28), reverse primer 3′-GACACAACAGTTTCAAAATATTGTTCTCCTT (SEQ ID No. 29) and two allelic-specificfluorescent dye labeled probes (VIC- AATGCACCAAGTAATTT (SEQ ID No. 30);6-FAM-ATGCACCAAATAATTT) (SEQ ID No. 31).

Statistical Analyses

Tests of association were performed using the PLINK Version 1.05software package (Purcell; Purcell el al. 2007). The linear regressionmodel was used to evaluate quantitative trait association atMSTNg.66493737C>T with the phenotypes: V_(max), V_(maxt), , V_(maxD6b),V_(maxD6a) and V_(maxD6). The following were included as covariates inthe analyses as they had all been found to contribute to variation inspeed indices (Fonseca et al., 2010): Age, Sex, accWD, Jockey and Going.

Results MSTN Genotypes

MSTNg.66493737C>T genotypes were determined for all individuals in thestudy. There were 21 (24.7 %) C/C, 44 (51.7 %) C/T and 20 (23.5 %) T/Tindividuals, representing a normal distribution of the genotypespreviously observed among a large cohort of Flat racehorses (Hill etal., 2010, the entire contents of which is incorporated herein byreference).

MSTN Genotype Association with Speed Indices

MSTNg.66493737C>T genotypes were significantly associated with V_(maxD6)(P=0.0040), V_(maxt) (P=0.0249), V_(max) (P=0.0265) and V_(maxD6a)(P=0.0317) (Table 10). For each speed index the C/C cohort out-performedthe C/T and T/T cohorts (Table 11). The mean distance (m) travelled was3.8 m and 1.2 m greater in the C/C (195.7 m; 98.2 m) than the T/T (191.9m; 96.9) cohort during the 6 seconds before and after V_(max)(V_(maxD6)) and during the 6 seconds after V_(max) (V_(maxD6a)). V_(max)was 0.31 m/s faster in the C/C (16.6 m/s) cohort than the T/T (16.29m/s) cohort and V_(max) was maintained (V_(maxt)) for 2.05 s longer inthe C/C (7.3 s) than the T/T (5.25 s) cohort.

TABLE 10 Association test results between measured speed variables andthe MSTNg.66493737C>T SNP TEST NMISS BETA STAT P Acc6b ADD 78 0.00740.5426 0.5893 GENO_2DF 78 0.2990 0.8611 Dist6 ADD 74 −2.4790 −3.14700.0026 GENO_2DF 74 11.0500 0.0040 Dist6a ADD 78 −0.7972 −1.9090 0.0608GENO_2DF 78 6.9040 0.0317 Dist6b ADD 75 0.8968 0.1695 0.8660 GENO_2DF 751.3920 0.4985 Tvmax ADD 81 −1.0640 −2.1040 0.0392 GENO_2DF 81 7.38500.0249 Vmax ADD 85 −0.1613 −2.5260 0.0138 GENO_2DF 85 7.2620 0.0265

TABLE 11 Mean values for speed parameters for each genotype GENO T/T T/CC/C CC: TT Dist6 (m) MEAN 191.9 194.7 193.7 3.8 Dist6a (m) MEAN 96.9398.23 98.16 1.23 TVmax (s) MEAN 5.25 7.366 7.3 2.05 Vmax (m/s) MEAN16.29 16.48 16.6 0.31

These data have demonstrated that genotypes at the MSTNg.66493737C>Tlocus have a significant influence in the determination of individualdifferences in speed

Example 8 MSTN Gene Expression in Resting Skeletal Muscle before andafter Training

The MSTNg.66493737C>T SNP has been found to be significantly associatedwith Thoroughbred horse racing phenotypes and significant reductions inThoroughbred skeletal muscle gene expression for three 17 bp transcripts400-1,500 base pairs downstream of the MSTN gene following a period oftraining have been observed (McGivney et al 2010 BMC Genomics, theentire contents of which is incorporated herein by reference). Togetherthese findings demonstrate that the identified MSTN genotypes mayinfluence MSTN gene expression. To investigate this, MSTN mRNAexpression was measured in biopsies from the middle gluteal muscle from60 untrained yearling Thoroughbreds (C/C, n=15; C/T, n=28; T/T, n=17)using two independent real time qRT-PCR assays. MSTN gene expression wasalso evaluated in a subset (n=33) of these animals using muscle RNAsamples collected after a ten-month period of training. A significantassociation was observed between genotype and mRNA abundance for theuntrained horses (assay I, P=0.0237; assay II, P=0.003559), with the C/Ccohort having the-highest MSTN mRNA levels, the T/T group the lowestlevels and the C/T group intermediate levels. Following training therewas a significant decrease in MSTN mRNA (−3.35-fold; P=6.9×10⁻⁷)which-was most apparent for the C/C cohort (−5.88-fold, P=0:001). Theseresults show a significant association between phenotype, genotype andgene expression at the MSTN gene in Thoroughbred racehorses.

MSTN Gene Expression

MSTN mRNA expression in two independent real-time qRT-PCR assays (Table12) has been investigated in resting skeletal muscle (gluteus medius)from biopsy samples that had been collected for n=60 untrained yearlings(C/C, n=15; C/T, n=28; T/T, n=17).

TABLE 12 Primer sequences for qRT-PCRassays for MSTN gene expression and TTN reference gene expression TargetPrimer Name Gene Location Sequence TTN_FOR Titin Exon 357gcatgacacaactggaaagc (TTN) (SEQ ID No. 32) TTN_REV Titin Exon 357aactttgccctcatcaatgc (TTN) (SEQ ID No. 33) MSTN1-2_FOR Myostatin Exon 1tgacagcagtgatggctctt (MSTN) (SEQ ID No. 34) MSTN1-2_REV Myostatin Exon 2ttgggttttccttccacttg (MSTN) (SEQ ID No. 35) MSTN2-3_FOR Myostatin Exon 2ttcccaagaccaggagaaga (MSTN) (SEQ ID No. 36) MSTN2-3_REV Myostatin Exon 3cagcatcgagattctgtgga (MSTN) (SEQ ID No. 37)

We found a significant association with genotype for the MSTN 66493737(T/C) SNP (P=0.003559). The C/C genotype cohort had higher MSTN mRNAlevels (654.3±354.3; 613.7±327.0) than cither of the C/T (405.7±234.1;368.6±213.6) and T/T (350.1±185.5; 348.1±167.2) cohorts (FIG. 10).

It was also found that MSTN gene expression is significantlydown-regulated (−4.2-fold, P=0.0043) following a period of training. Inthe Thoroughbred horse skeletal muscle transcriptomc the greatestreduction in gene expression following a period of training is MSTN geneexpression.

Results from analyses of gene expression generated since our initialreport[Hill et al, 2010, the entire contents of which is incorporatedherein by reference] of an association between MSTN genomic variationand optimum racing distance in Thoroughbreds support the hypothesis thatthe MSTN gene is functionally relevant to racing performance variation.In a transcriptome-wide investigation using digital gene expression(DGE) technology, we identified the greatest alteration in mRNAabundance in transcripts from MSTN in Thoroughbred skeletal musclefollowing a ten-month period of exercise training. Seventy-fourannotated transcripts were differentially expressed between pre- andpost-training states and among the 58 genes with decreased expression,MSTN mRNA transcripts were the most significantly reduced (−4.2-fold,P=0.0043) (McGivney et al. 2010, the entire contents of which isincorporated herein by reference).

Example 9

The mechanism by which the g.66493737C<T sequence variant may affect themuscle phenotype in horses is not clear; however we propose a directeffect,of the SNP on the control of myocyte development. Myostatin is agrowth and differentiation factor (GDF8) that functions as a negativeregulator of skeletal muscle mass development and results inhypertrophied muscle phenotypes in a range of mammalian species,including horse. Consistent with this role myostatin has been shown torepress the proliferation and differentiation of cultured myocytes(Thomas et al 2000; Langley et al 2002; Joulia et al. 2003). Theproliferation of myoblasts is determined by the control and progressionof the cell cycle, a role which has been assigned to members of the E2Ffamily of transcription factors (Polager & Ginsberg 2009). Theg.66493737C<T SNP is located within the sequence of a putative E2Ftranscription factor binding site in intron 1 of the MSTN gene. It maytherefore be plausible to propose a mechanism by which allele-specificbinding of E2F to myostatin influences the growth and development ofmyocytes following signalling from upstream effector proteins such asretinoblastoma protein (Hallstrom & Nevins 2009). Genotype-specific geneexpression studies will shed light on the allele-specific effect onfunction.

The predictive tests described herein may be applied to selectindividuals with high or low genetic potential for racing success. Thesetests can be performed on an individual at any stage in the life cyclee.g. Day 1 (birth), prior to sales (i.e. yearlings, 2 year olds etc),during racing career (i.e. from 2 years old), during breeding (i.e. upto approx 25 years). Also, the tests may be applied to selectappropriate stallion—mare matches for mating based on the geneticmake-up of mare and stallion.

Modifications and additions can be made to the embodiments of theinvention described herein without departing from the scope of theinvention. For example, while the embodiments described herein refer toparticular features, the invention includes embodiments having differentcombinations of features. The invention also includes embodiments thatdo not include all of the specific features described.

The invention is not limited to the embodiments hereinbefore described,which may be varied in construction and detail.

REFERENCES

-   Ballard J. W. & Dean M. D. (2001) The mitochondrial genome:    mutation, selection and recombination. Curr Opin Genet Dev    11,667-72.-   Barrett J. C., Fry B., Mailer J. & Daly M. J. (2005) Haploview:    analysis and visualization of LD and haplotype maps. Bioinformatics    21,263-5.-   Barrett J. C. (2009) Haploview: Visualization and analysis of SNP    genotype data. Cold Spring Harb Protoc 2009, pdb ip71.-   Barrey E., Valette J. P., Jouglin M., Blouin C. & Langlois B. (1999)    Heritability of percentage of fast myosin heavy chains in skeletal    muscles and relationship with performance. Equine Vet J Suppl    30,289-92.-   Blier P. U., Dufresne F. & Burton R. S. (2001) Natural selection and    the evolution of mtDNA-encoded peptides: evidence for intergenomic    co-adaptation. Trends Genet 17,400-6.-   Bray M S, Hagberg J M, Perusse L, Rankinen T, Roth S M, Wolfarth B,    Bouchard C. The human gene map for performance and health-related    fitness phenotypes: the 2006-2007 update. Med Sci Sports Exerc. 2009    Jan;41(l):35-73.-   Buitrago M., Lorenz K., Maass A. H., Oberdorf-Maass S., Keller U.,    Schmitteckert E. M., Ivashchenko Y., Lohse M. J. &    Engelhardt S. (2005) The transcriptional repressor Nab1 is a    specific regulator of pathological cardiac hypertrophy. Nat Med 11,    837-44.-   Cartharius K., Freeh K., Grote K., Klocke B., Haltmeicr M.,    Klingenhoff A., Frisch M., Bayerlein M. & Werner T. (2005)    Matlnspector and beyond: promoter analysis based on transcription    factor binding sites. Bioinformatics 21, 2933-42.-   Clop A., Marcq F., Takeda H., Pirottin D., Tordoir X., Bibe B.,    Bouix J., Caiment F., Elsen J. M., Eychcnne F., Larzul C., Laville    E., Meish F., Milenkovic D., Tobin J., Charlier C. &    Georges M. (2006) A mutation creating a potential illegitimate    microRNA target site in the myostatin gene affects muscularity in    sheep. Nat Genet 38, 813-8.-   Cunningham E P, Dooley J J, Splan R K, Bradley D G. Microsatellite    diversity, pedigree relatedness and the contributions of founder    lineages to Thoroughbred horses. Anim Genet. 2001    December;32(6):360-4.-   Das S. (2006) The role of mitochondrial respiration in physiological    and evolutionaiy adaptation. Bioessays 28, 890-901.-   Dempsey and Wagner 1999 Exercise-induced arterial hypoxemia. J Appl    Physiol. 1999 December;87(6): 1997-2006. Review.PMID: 10601141-   Fukuda R, Zhang H, Kim J W, Shimoda L, Dang C V, Semenza G L.HIF-1    regulates cytochrome oxidase subunits to optimize efficiency of    respiration in hypoxic cells.Cell. 2007 Apr. 6;129(1):111-22.-   Gordon D, Abajian C, Green P. Consed: a graphical tool for sequence    finishing. Genome Res. 1998 March;8(3): 195-202.-   Gramkow and Evans 2006 Gramkow H L, Evans D L. Correlation of race    earnings with velocity at maximal heart rate during a field exercise    test in Thoroughbred racehorses.Equine Vet J Suppl. 2006    August;(36):118-22. PMID: 17402405.-   Grobet L., Martin L. J., Poncelet D., Pirottin D., Brouwers B.,    Riquet J., Schoeberlein A., Dunner S., Menissier F., Massabanda J.,    Fries R., Hanset R. & Georges M. (1997) A deletion in the bovine    myostatin gene causes the double-muscled phenotype in cattle. Nat    Genet 17, 71-4.-   Gu J, Orr N, Park S D, Katz L M, Sulimova G, MacHugh D E, Hill E W.    A genome scan for positive selection in thoroughbred horses. PLoS    One. 2009 Jun. 2;4(6):e5767.-   Gunn H M. Muscle, bone and fat proportions and muscle distribution    of thoroughbreds and quarter horses. In: Gillespie J R, Robinson N E    eds. Equine exercise physiology 2. Davis, C A: ICEEP; 1987:253-264.-   Gunn H.M. (1987) Muscle, bone and fat proportions and muscle    distribution of thoroughbreds and quarter horses. In: Equine    exercise physiology 2: Proceedings of the Second International    Conference on Equine Exercise Physiology; San Diego, Calif. Aug.    7-11, 1986 (eds. By Gillespie J R & Robinson N E), pp. xiii, 810p.    ICEEP Publications, Davis, Calif.-   Harkins et al., 1993 Harkins J D, Hackett R P, Ducharme N G. Effect    of furosemide on physiologic variables in exercising horses. Am J    Vet Res. 1993 December;54(12):2104-9.PMID: 8116946-   Hallstrom T. C. & Nevins J. R. (2009) Balancing the decision of cell    proliferation and cell fate. Cell Cycle 8, 532-5.-   Hill E. W., McGivney B. A., Gu J., Whiston R. & Machugh D. E. (2010)    A genome-wide SNP-association study confirms a sequence variant    (g.66493737C>T) in the equine myostatin (MSTN) gene as the most    powerful predictor of optimum racing distance for Thoroughbred    racehorses. BMC Genomics 11, 552.-   Hoppeler and Vogt, 2001 Muscle tissue adaptations to hypoxia. J Exp    Biol. 2001 September;204(Pt 18):3133-9. Review.PMID: 11581327-   Jorgensen T. J., Ruczinski I., Kessing B., Smith M. W.,    Shugart Y. Y. & Alberg A. J. (2009) Hypothesis-driven candidate gene    association studies: practical design and analytical considerations.    Am J Epidemiol 170, 986-93.-   Joulia D., Bernardi H., Garandel V., Rabenoelina F., Vernus B. &    Cabello G. (2003) Mechanisms involved in the inhibition of myoblast    proliferation and differentiation by myostatin. Exp Cell Res    286,263-75.-   Langley B, Thomas M., Bishop A., Sharma M., Gilmour S. &    Kambadur R. (2002) Myostatin inhibits myoblast differentiation by    down-regulating MyoD expression. J Biol Chem 277, 49831-40.-   Love S, Wyse C A, Stirk A J, Stear M J, Calver P, Voute L C, Mellor    D J. Prevalence, heritability and significance of musculoskeletal    conformational traits in Thoroughbred yearlings. Equine Vet J. 2006    Nov;38(7):597-603.PM1D: 17228572-   Martin Flück 2006 Functional, structural and molecular plasticity of    mammalian skeletal muscle in response to exercise stimuli. The    Journal of Experimental Biology 209,2239-2248-   Matoba S, Kang J G, Patino W D, Wragg A, Boehm M, Gavrilova O,    Hurley P J, Bunz F, Hwang P M. p53 regulates mitochondrial    respiration. Science. 2006 Jun 16;312(5780): 1650-3. Epub 2006 May    25.-   McGivney B. A., Eivers S. S., MacHugh D. E., MacLeod J. N.,    O'Gorman G. M., Park S. D., Katz L. M. & Hill E. W, (2009)    Transcriptional adaptations following exercise in thoroughbred horse    skeletal muscle highlights molecular mechanisms that lead to muscle    hypertrophy. BMC Genomics 10, 638.-   McGivney B. A., McGettigan P. A., Browne J. A., Evans A. C.,    Fonseca R. G., Loftus B. J., Lohan A., MacHugh D. E., Murphy B. A.,    Katz L. M. & Hill E. W. (2010) Characterization of the equine    skeletal muscle transcriptome identifies novel functional responses    to exercise training. BMC Genomics 11, 398.-   McPherron A. C., Lawler A. M. & Lee S. J. (1997) Regulation of    skeletal muscle mass in mice by a new TGF-beta superfamily member.    Nature 387,83-90.-   McPherron A. C. & Lee S. J. (1997) Double muscling in cattle due to    mutations in the myostatin gene. Proc Natl Acad Sci USA 94,12457-61.-   Meiklejohn C. D., Montooth K. L. & Rand P. M. (2007) Positive and    negative selection on the mitochondrial genome. Trends Genet    23,259-63.-   Mosher D S, Quignon P, Bustamante C D, Sutter N B, Mellersh C S,    Parker H G, Ostrander E A. A mutation in the myostatin gene    increases muscle mass and enhances racing performance in    heterozygote dogs. PLoS Genet. 2007 May 25;3(5):e79. Epub 2007 Apr.    30.-   Polager S. & Ginsberg D. (2009) p53 and E2f: partners in life and    death. Nat Rev Cancer 9, 738-48.-   Purcell S. PLINK version 1.05. URL    http://pngu.mgh.harvard.edu/purcell/plink/. Purcell S., Neale B.,    Todd-Brown K., Thomas L., Ferreira M. A., Bender D., Mailer J.,    Sklar P., de Bakker P. I., Daly M. J. & Sham P. C. (2007) PLINK: a    tool set for whole-genome association and population-based linkage    analyses. Am J Hum Genet 81, 559-75.-   Revington M. Haematology of the racing Thoroughbred in Australia 2:    haematological values compared to performance. Equine Vet J. 1983    April;15(2):145-8.PMID: 6873047-   Rivero J L, Ruz A, Marti-Korff S, Estepa J C, Aguilera-Tejero E,    Werkman J, Sobotta M, Lindner A. Effects of intensity and duration    of exercise on muscular responses to training of Thoroughbred    racehorses. J Appl Physiol. 2007 May;102(5): 1871-82. Epub 2007 Jan.    25.PMID: 17255370.-   Rivero J.-L., L, & Piercy R. J. (2008) Muscle physiology: responses    to exercise and training. In: Equine exercise physiology: the    science of exercise in the athletic horse (eds. by Hinchcliff K W,    Kaneps A J & Geor R J), pp. ix, 463 p. Elsevier Saunders, Edinburgh.-   Rivero J. L., Serrano A. L., Henckel P. & Aguera E. (1993) Muscle    fiber type composition and fiber size in successfully and    unsuccessfully endurance-raced horses. J Appl Physiol 75, 1758-66.-   Rozen S. & Skaletsky H. (2000) Primer3 on the WWW for general users    and for biologist programmers. Methods Mol Biol 132,365-86.-   Saleem A, Adhihetty P J, Hood D A. Role of p53 in mitochondrial    biogenesis and apoptosis in skeletal muscle. Physiol Genomics. 2009    Mar. 3;37(1):58-66. Epub 2008 Dec. 23.Links-   Sambrook, J. and D. Russell (2001). Molecular Cloning; A Laboratory    Manual, Cold Spring Harbor Laboratory.-   Schuelke M., Wagner K. R., Stolz L. E., Hubner C., Riebel T., Komen    W., Braun T., Tobin J. F. & Lee S. J. (2004) Myostatin mutation    associated with gross muscle hypertrophy in a child. N Engl J Med    350, 2682-8.-   Seaman J, Erickson B K, Kubo K, Hiraga A, Kai M, Yamaya Y, Wagner    P D. Exercise induced ventilation/perfusion inequality in the horse.    Equine Vet J. 1995 March;27(2):104-9.PMID: 7607141-   Suzanne S. Eivers, Beatrice A. McGivney, Rita G. Fonseca, David E.    MacHugh, Katie Menson, Stephen D. Park, Jose-Luis L. Rivero,    Cormac T. Taylor, Lisa M. Katz and Emmeline W. Hill^(*)    Exercise-induced skeletal muscle gene expression in unconditioned    and conditioned Thoroughbred horses and associations with    physiological variables. Physiological Genomics, In Preparation    (2009)-   Taylor C T, Colgan S P. Therapeutic targets for hypoxia-elicited    pathways. Pharm Res. 1999 October; 16(10): 1498-505. Review. PMID:    10554089.-   Tabor H. K., Risch N. J. & Myers R. M. (2002) Opinion:    Candidate-gene approaches for studying complex genetic traits:    practical considerations. Nat Rev Genet 3,391-7.-   Thiel G., Kaufmann K., Magin A., Lietz M., Bach K. &    Cramer M. (2000) The human transcriptional repressor protein NAB1:    expression and biological activity. Biochim Biophys Acta 1493,    289-301.-   Thomas M., Langley B., Berry C., Sharma M., Kirk S., Bass J. &    Kambadur R. (2000) Myostatin, a negative regulator of muscle growth,    functions by inhibiting myoblast proliferation. J Biol Chem 275,    40235-43.-   van Baren M. J. & Heutink P. (2004) The PCR suite. Bioinformatics    20, 591-3.-   van Deursen et al. 1993 Skeletal muscles of mice deficient in muscle    creatine kinase lack burst activity Cell 74: 621-631.-   Wade C. M., Giulotto E., Sigurdsson S., Zoli M., Gnerre S., Imsland    F., Lear T. L., Adelson D. L., Bailey E., Bellone R. R., Blocker H.,    Distl O., Edgar R. C., Garber M., Leeb T., Mauceli E, MacLeod J. N.,    Penedo M. C., Raison J. M., Sharpe T., Vogel J., Andersson L.,    Antczak D. F., Biagi T., Binns M. M., Chowdhary B. P., Coleman S.    J., Delia Valle G., Frye S., Guerin G., Hasegawa T., Hill E. W.,    Jurka J., Kiialainen A., Lindgren G., Liu J., Magnani E.,    Mickelson J. R., Murray J., Nergadze S. G., Onofrio R., Pedroni S.,    Piras M. F., Raudsepp T., Rocchi M., Roed K. H., Ryder O A., Searle    S., Skow L., Swinburne J. E., Syvanen A. C., Tozaki T., Valberg S.    J., Vaudin M., White J. R., Zody M. C., Lander E. S. &    Lindblad-Toh K. (2009) Genome sequence, comparative analysis, and    population genetics of the domestic horse. Science 326, 865-7.-   Weatherby and Sons (1791) An Introduction to a General Stud Book.    Weatherby and Sons, London.-   Weber K, Bruck P, Mikes Z, Küpper J H, Klingenspor M, Wiesner R J.    Glucocorticoid hormone stimulates mitochondrial biogenesis    specifically in skeletal muscle. Endocrinology. 2002 January;    143(1): 177-84.-   Willett P. (1981) The classic racehorse. Stanley Paul, London.-   Williamson S. A. & Beilharz R. G. (1998) The inheritance of speed,    stamina and other racing performance characters in the Australian    Thoroughbred. J Anim Breed Genet 115,1 -16.-   Yang Q, Khoury M J, Botto L, Friedman J M, Flanders W D. Improving    the prediction of complex diseases by testing for multiple    disease-susceptibility genes. Am J Hum Genet. 2003    March;72(3):636-49. Epub 2003 Feb. 14.-   Young L E, Rogers K, Wood J L. Left ventricular size and systolic    function in Thoroughbred racehorses and their relationships to race    performance. J Appl Physiol. 2005 October;99(4): 1278-85. Epub 2005    May 26.PMID: 15920096-   Zhou M., Wang Q., Sun J., Li X., Xu L., Yang H., Shi H., Ning S.,    Chen L., Li Y., He T. & Zheng Y. (2009) In silico detection and    characteristics of novel microRNA genes in the Equus caballus genome    using an integrated ab initio and comparative genomic approach.    Genomics 94, 125-31.

1-18. (canceled)
 19. A method of training a Thoroughbred race horse for optimal racing distance, comprising the steps of: a) identifying a Thoroughbred race horse that is or may become sufficiently developed for race training, b) obtaining a biological sample from the horse, c) obtaining DNA from the sample and conducting a genotypic analysis to identify at least one genetic variant in the biological sample from the horse, and d) training the horse based on results of the analysis; wherein the genetic variant is in linkage disequilibrium with an MSTN-8S493737 (T/C) single nucleotide polymorphism (SNP), and wherein: i) the horse has a homozygous genotype of the genetic variant and is trained to race as a sprinter, ii) the horse has a heterozygous genotype of the genetic variant and is trained to race over middle distances, or iii) the horse does not have the genetic variant and is trained to race as a stayer.
 20. The method of claim 19, wherein the horse is a two-year old.
 21. The method of claim 19, wherein the genetic variant is in the MSTN gene region.
 22. The method of claim 19, wherein the genetic variant is in the MSTN gene flanking region.
 23. The method of claim 19, wherein the genetic variant is a SNP or an insertion polymorphism.
 24. The method of claim 23, wherein the genetic variant is a Chr18g.66495327lns227bp68495326 polymorphism.
 25. The method of claim 19, wherein an r² value of linkage disequilibrium between the genetic variant and the MSTN-66493737 (T/C) SNP is at least 0.5.
 26. The method of claim 19, wherein the DNA from the sample is genomic DNA.
 27. The method of claim 19, wherein the biological sample is one or more of blood, saliva, skeletal muscle, hair, semen, bone marrow, soft tissue, internal organ biopsy sample, or skin of the horse.
 28. The method of claim 19, further comprising the steps of: a) extracting or releasing DNA from the biological sample, b) amplifying a target sequence or region in the DNA, c) identifying the MSTN-66493737 (T/C) SNP, and d) identifying the genetic variant in linkage disequilibrium with the MSTN- 66493737 (T/C) SNP, wherein the target sequence or region comprises the MSTN gene region and/or the MSTN gene flanking region.
 29. A method of breeding a Thoroughbred race horse with elite athletic performance potential, comprising the steps of: a) obtaining a DNA sample from a Thoroughbred broodmare and conducting a genotypic analysis to identify at least one genetic variant in the DNA sample from the Thoroughbred broodmare, b) obtaining a DNA sample from a Thoroughbred stallion and conducting a genotypic analysis to identify at least one genetic variant in the DNA sample from the Thoroughbred stallion, and c) mating the broodmare with the stallion to produce a Thoroughbred offspring, wherein the genetic variant in the broodmare and in the stallion is in linkage disequilibrium with an MSTN-66493737 (T/C) SNP, and wherein: i) the broodmare and the stallion each have a homozygous genotype of the genetic variant and the offspring is bred to have elite sprinting performance potential, ii) the broodmare and the stallion each do not have the genetic variant and the offspring is bred to have stamina performance potential, iii) one of the broodmare and the stallion has a homozygous genotype of the genetic variant and the other horse in the mating pair has a heterozygous genotypes of the genetic variant, and the offspring is bred to have either elite sprinting performance potential or middle distance racing performance potential, iv) one of the broodmare and stallion has a homozygous genotype of the genetic variant and the other horse in the mating pair does not have the genetic variant, and the offspring is bred to have middle distance racing performance potential, or v) the broodmare and the stallion each have a heterozygous genotype of the genetic variant, and the offspring is bred to have elite sprinting performance potential, middle distance racing performance potential, or stamina performance potential.
 30. The method of claim 29, wherein the genetic variant is in the MSTN gene region.
 31. The method of claim 29, wherein the genetic variant is in the MSTN gene flanking region.
 32. The method of claim 29, wherein the genetic variant is a SNP or an insertion polymorphism.
 33. The method of claim 32, wherein the genetic variant is a Chr18g.66495327Ins227bp66495326 polymorphism.
 34. The method of claim 29, wherein an r² value of linkage disequilibrium between the genetic variant and the MSTN-66493737 (T/C) SNP is at least 0.5.
 35. The method of claim 29, wherein the DNA sample comprises genomic DNA.
 38. The method of claim 29, wherein the DNA sample of the broodmare and/or stallion is isolated from one or more of blood, saliva, skeletal muscle, hair, semen, bone marrow, soft tissue, internal organ biopsy sample, or skin of the horse.
 37. The method of claim 29, further comprising the steps of; a) amplifying a target sequence or region in the DNA sample of the broodmare and/or stallion, b) identifying the MSTN-66493737 (T/C) SNP, and c) identifying the genetic variant in linkage disequilibrium with the MSTN-66493737 (T/C) SNP, wherein the target sequence or region comprises the MSTN gene region and/or the MSTN gene flanking region.
 38. The method of claim 29, further comprising obtaining a DNA sample from the offspring and conducting a genotypic analysis to identify the genetic variant in the DNA sample from the offspring.
 39. The method of claim 38, wherein: i) the offspring has a homozygous genotype of the genetic variant and is trained to race as a sprinter, ii) the offspring has a heterozygous genotype of the genetic variant and is trained to race over middle distances, or iii) the offspring does not have the genetic variant and is trained to race as a stayer. 